Helge Rhodin

Assistant Professor

Research Classification

Research Interests

Shape Recognition and Computer Graphics
Virtual Reality
Neuronal Systems
computer graphics
Computer Vision
Machine Learning

Relevant Thesis-Based Degree Programs

Affiliations to Research Centres, Institutes & Clusters


Research Methodology

Human motion capture equipment


Postdoctoral Fellows
Any time / year round
I support public scholarship, e.g. through the Public Scholars Initiative, and am available to supervise students and Postdocs interested in collaborating with external partners as part of their research.
I support experiential learning experiences, such as internships and work placements, for my graduate students and Postdocs.
I am open to hosting Visiting International Research Students (non-degree, up to 12 months).

Complete these steps before you reach out to a faculty member!

Check requirements
  • Familiarize yourself with program requirements. You want to learn as much as possible from the information available to you before you reach out to a faculty member. Be sure to visit the graduate degree program listing and program-specific websites.
  • Check whether the program requires you to seek commitment from a supervisor prior to submitting an application. For some programs this is an essential step while others match successful applicants with faculty members within the first year of study. This is either indicated in the program profile under "Admission Information & Requirements" - "Prepare Application" - "Supervision" or on the program website.
Focus your search
  • Identify specific faculty members who are conducting research in your specific area of interest.
  • Establish that your research interests align with the faculty member’s research interests.
    • Read up on the faculty members in the program and the research being conducted in the department.
    • Familiarize yourself with their work, read their recent publications and past theses/dissertations that they supervised. Be certain that their research is indeed what you are hoping to study.
Make a good impression
  • Compose an error-free and grammatically correct email addressed to your specifically targeted faculty member, and remember to use their correct titles.
    • Do not send non-specific, mass emails to everyone in the department hoping for a match.
    • Address the faculty members by name. Your contact should be genuine rather than generic.
  • Include a brief outline of your academic background, why you are interested in working with the faculty member, and what experience you could bring to the department. The supervision enquiry form guides you with targeted questions. Ensure to craft compelling answers to these questions.
  • Highlight your achievements and why you are a top student. Faculty members receive dozens of requests from prospective students and you may have less than 30 seconds to pique someone’s interest.
  • Demonstrate that you are familiar with their research:
    • Convey the specific ways you are a good fit for the program.
    • Convey the specific ways the program/lab/faculty member is a good fit for the research you are interested in/already conducting.
  • Be enthusiastic, but don’t overdo it.
Attend an information session

G+PS regularly provides virtual sessions that focus on admission requirements and procedures and tips how to improve your application.



These videos contain some general advice from faculty across UBC on finding and reaching out to a potential thesis supervisor.

Graduate Student Supervision

Master's Student Supervision

Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.

Explicit and implicit warping for accurate human pose estimation and low-latency neural rendering (2023)

Deep neural networks have become an integral part of modern advances in the field of computer vision. However, these solutions are not practical when relying on increasingly large neural networks and diverse datasets to scale. Prior works demonstrate that embedding domain/application-specific knowledge in both the architecture design and training procedure is one way to improve scalability. In this thesis, we propose two methods that leverage domain knowledge, defined through explicit and implicit warping, to create more data and runtime efficient networks in two applications. First, we compute an explicit warping to disentangle the learning of camera intrinsic parameters from the human pose estimation pipeline. Our explicit warping takes into account the region of interest and the camera's focal length to define a perspective-correct crop. By including this as a preprocess or end-to-end component in the network, we significantly increase performance, especially in cases where the subject is near the boundary of the image. Second, we leverage the knowledge that sequential frames in talking-head video conferencing have significant visual overlap. Therefore, we design a simple and effective implicit warping strategy between timesteps to greatly decrease the latency and increase the framerate of talking-head neural rendering. Our proposed methods demonstrate significant improvements in accuracy and latency in their respective applications.

View record

M-NeRF : model-based human reconstruction from scratch with mirror-aware neural radiance fields (2023)

Human motion capture either requires multi-camera systems or is unreliable using single-view input due to depth ambiguities. Meanwhile, mirrors are readily available in urban environments and can take the role of additional views. When picturing a person in front of a mirror, the mirror image provides a second view of the person using only a single camera. Prior work has hence exploited this additional constraint to improve 3D human pose reconstruction. Going beyond existing mirror approaches, we utilize mirrors for learning a complete body model, including shape and appearance. Our main contribution is extending articulated neural radiance fields (NERFs) to include a notion of a mirror and making it sample-efficient. We integrate this into an entire system that succeeds without any 3D annotation by automatically calibrating the camera, estimating mirror orientation, and subsequently lifting 2D keypoint detections to 3D skeleton pose that is used to condition the mirror-aware NeRF. We empirically demonstrate the benefit of learning a body model and accounting for mirror-occlusion in challenging mirror scene setups. We show continuous improvements on time-varying articulated 3D joint estimation, reconstruct the body geometry from only mirror images and 2D detections, and synthesize novel views from unobserved viewpoints.

View record

Pedestrian intent estimation through visual attention and time and memory conscious u-shaped networks for training neural radiance fields (2022)

When people cross the street, they make a series of movements that are indicativeof their attention, intention, and comprehension of the roadside environment.These patterns in attention are linked to individual characteristics which are oftenneglected by autonomous vehicle prediction algorithms. We make two strides towardsmore personalized pedestrian modelling. First, we design an outdoor datastudy to collect behavioural signals such as pupil, gaze, head, and body orientationfrom an ego-centric human point of view. We gather this data over a rangeof diverse variables including age, gender, geographical context, crossing type,time of day, and the presence of a companion. In order for simulation engines tobe able to leverage such dense data, efficient 3D human and scene reconstructionalgorithms must be available. The increased resolution and model-free nature ofNeural Radiance Fields for large scene reconstruction and human motion synthesiscome at the cost of high training times and excessive memory requirements. Littlehas been done to reduce the resources required at training time in a manner thatsupports both dynamic and static tasks. Our second contribution takes the formof an efficient method which provides a reduction of the memory footprint, improvedaccuracy, and reduced amortized processing time both during training andinference. We demonstrate that the conscious separation of view-dependent appearanceand view-independent density estimation improves novel view synthesisof static scenes as well as dynamic human shape and motion. Further, we show thatour method, UNeRF, can be used to augment other state-of-the-art reconstructiontechniques to further accelerate and enhance the improvements which they present.

View record

AudioViewer: learning to visualize sound (2021)

Sensory substitution can help persons with perceptual deficits. In this work, we attempt to visualize audio with video. Our long-term goal is to create sound perception for hearing impaired people, for instance, to facilitate feedback for training deaf speech. Different from existing models that translate between speech and text or text and images, we target an immediate and low-level translation that applies to generic environment sounds and human speech without delay. No canonical mapping is known for this artificial translation task. Our design is to translate from audio to video by compressing both into a common latent space with a shared structure. Our core contribution is the development and evaluation of learned mappings that respect human perception limits and maximize user comfort by enforcing priors and combining strategies from unpaired image translation and disentanglement. We demonstrate qualitatively and quantitatively that our AudioViewer model maintains important audio features in the generated video and that generated videos of faces and numbers are well suited for visualizing high-dimensional audio features since they can easily be parsed by humans to match and distinguish between sounds, words, and speakers.

View record

Human pose and stride length estimation (2021)

In this thesis, we develop Computer Vision methods for Human body pose and stride length estimation. We first describe a framework for estimating the stride length of a walking subject from video using a multi-view camera setup. We specifically look into its utility in diagnosing Parkinson's disease. We do this using per frame 3D pose estimates and using an analysis of foot movement, we determine the length of the stride. Parkinson's diagnosis partly relies on stride length information; we claim that our method can be helpful in diagnosis. The current practice in the medical field is to estimate stride with complicated and fundamentally flawed sensors as they tend to affect the gait of the subjects using them. A benefit of our method is that cameras are relatively cheap, easily obtainable, and only need to be set up once. We also describe work done in improving the state of the art in human pose estimation. We first propose a pose refinement method that enhances state-of-the-art methods. Through analysis of our refiner, we show a flaw inherent in the human body model---the inaccuracy in the typical shape-to-pose regressor (joint regressor)---for a standard human pose dataset and show that the results on the top methods are actually being underreported. This flaw results in a situation where the ground truth joints are unsatisfiable with biologically plausible poses. We then address this flaw by modifying a part of the human body model. We reevaluate top state-of-the-art methods and show these models perform better with this modification without retraining.

View record

Learned acoustic reconstruction using synthetic aperture focusing (2021)

Navigating and sensing the world through echolocation in air is an innate ability in many animals for which analogous human technologies remain rudimentary. Many engineered approaches to acoustic reconstruction have been devised which typically require unwieldy equipment and a lengthy measurement process, and are largely not applicable in air or in everyday human environments. Recent learning-based approaches to single-emission in-air acoustic reconstruction use simplified hardware and an experimentally-acquired dataset of echoes and the geometry that produced them to train models to predict novel geometry from similar but previously-unheard echoes. However, these learned approaches use spatially-dense representations and attempt to predict an entire scene all at once. Doing so requires a tremendous abundance of training examples in order to learn a model that generalizes, which leaves these techniques vulnerable to over-fitting.We introduce an implicit representation for learned in-air acoustic reconstruction inspired by synthetic aperture focusing techniques. Our method trains a neural network to relate the coherency of multiple spatially-separated echo signals, after accounting for the expected time-of-flight along a straight-line path, to the presence or absence of an acoustically reflective object at any sampling location. Additionally, we use signed distance fields to represent geometric predictions which provide a better-behaved training signal and allow for efficient 3D rendering. Using acoustic wave simulation, we show that our method yields better generalization and behaves more intuitively than competing methods while requiring only a small fraction of the amount of training data.

View record


If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.


Planning to do a research degree? Use our expert search to find a potential supervisor!