Research Classification
Research Interests
Relevant Thesis-Based Degree Programs
Affiliations to Research Centres, Institutes & Clusters
Research Options
Biography
Recruitment
Application of Machine Learning methods and Generative Models to 3D Computer Vision
Typically, successful applicants with MSc degrees have prior exposure to 3D Computer Vision and/or Deep Learning, evident from publications at Computer Vision / Graphics conferences (CVPR,ECCV,ICCV,NeurIPS,SIGGRAPH,WACV,BMVC,ICIP). For students directly applying to graduate school with BSc degrees, having a publication record is a plus, and prior exposure to research environments or evidence of research projects is suggested.
Note: For graduate student positions, it is essential that you meet the department deadline, which is December 15th. You will only then be considered as a potential candidate. Also, contacting me in advance will not likely make any difference, as long as you list me as a potential supervisor. Please see the department website before anything if you intend to apply for graduate school.
Complete these steps before you reach out to a faculty member!
Check requirements
- Familiarize yourself with program requirements. You want to learn as much as possible from the information available to you before you reach out to a faculty member. Be sure to visit the graduate degree program listing and program-specific websites.
- Check whether the program requires you to seek commitment from a supervisor prior to submitting an application. For some programs this is an essential step while others match successful applicants with faculty members within the first year of study. This is either indicated in the program profile under "Admission Information & Requirements" - "Prepare Application" - "Supervision" or on the program website.
Focus your search
- Identify specific faculty members who are conducting research in your specific area of interest.
- Establish that your research interests align with the faculty member’s research interests.
- Read up on the faculty members in the program and the research being conducted in the department.
- Familiarize yourself with their work, read their recent publications and past theses/dissertations that they supervised. Be certain that their research is indeed what you are hoping to study.
Make a good impression
- Compose an error-free and grammatically correct email addressed to your specifically targeted faculty member, and remember to use their correct titles.
- Do not send non-specific, mass emails to everyone in the department hoping for a match.
- Address the faculty members by name. Your contact should be genuine rather than generic.
- Include a brief outline of your academic background, why you are interested in working with the faculty member, and what experience you could bring to the department. The supervision enquiry form guides you with targeted questions. Ensure to craft compelling answers to these questions.
- Highlight your achievements and why you are a top student. Faculty members receive dozens of requests from prospective students and you may have less than 30 seconds to pique someone’s interest.
- Demonstrate that you are familiar with their research:
- Convey the specific ways you are a good fit for the program.
- Convey the specific ways the program/lab/faculty member is a good fit for the research you are interested in/already conducting.
- Be enthusiastic, but don’t overdo it.
Attend an information session
G+PS regularly provides virtual sessions that focus on admission requirements and procedures and tips how to improve your application.
ADVICE AND INSIGHTS FROM UBC FACULTY ON REACHING OUT TO SUPERVISORS
These videos contain some general advice from faculty across UBC on finding and reaching out to a potential thesis supervisor.
Graduate Student Supervision
Doctoral Student Supervision
Dissertations completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest dissertations.
Repurposing large pretrained diffusion models for unsupervised visual understanding and efficient adaptation (2025)
Large pretrained text-conditioned image generation models learn a compositional and structured latent representation of visual concepts, showcasing their rich understanding of the world through their ability to generate diverse, coherent images. These models link text descriptions to visual concepts, unifying concepts across a range of conditions such as understanding the relationships between the text input and objects in a scene. This thesis explores how this link between text and visual concepts enables identifying consistent semantic regularities across images, where similar regions are mapped through the same text embedding. We show that this can be leveraged for tasks like semantic correspondence and estimating consistent keypoints, simply by optimizing the text embedding to activate highly in a specific region in the image for a given token. We also take advantage of the capacity of the model for one-shot personalization given only a single image. We leverage this by training hypernetworks to quickly estimate network weights for subject personalized generation, whose convergence is only possible due to the smooth underlying representation of concepts learned by these models. This PhD thesis leverages large pretrained diffusion models to address three key areas: semantic correspondence, unsupervised keypoint detection, and efficient hypernetwork-based adaptation for personalized model fine tuning. For semantic correspondence, we optimize text tokens to focus attention on specific regions in an image, leveraging the latent knowledge of large pretrained models to identify correspondences from a single image without additional supervision. For unsupervised keypoint detection, we localize text tokens across a collection of images to identify common keypoints, using a collection of images to focus the model on a specific concept, leveraging the knowledge within the pretrained model to generalize without ground truth keypoints. We also investigate hypernetwork-based methods for generating weights for large model personalization conditioned on a single image, providing an efficient alternative to compute intense optimization without requiring ground truth weights. This work highlights the versatility of diffusion models, extending their utility beyond image generation while proposing scalable, efficient solutions for downstream tasks of semantic correspondence, unsupervised keypoint estimation, and hypernetwork-based personalized model fine tuning.
View record
Exploring explicit models for geometric point cloud learning (2024)
We are interested in processing point clouds -- a set of unordered points -- specifically in Euclidean space, such as 3D point cloud acquired from a range sensor (LiDAR) or 4D correspondence cloud in stereo matching task. Point clouds play an increasingly essential role in many tasks due to prevalence they hold. However, it is notoriously challenging to process point clouds with deep neural networks because of their irregular data structure, the difficulty in encoding contextual information from nearby points, and the large compute requirement that is typically required. This thesis addresses these challenges by enforcing intermediate features or model parameters to carry specific meanings such as attention and poses, leading to explicit representation. The meanings of explicit representation allow for traditional ways of manipulating features in order to solve target tasks. We refer to these architectures with explicit representations as explicit models. Explicit models largely improve performances without massively scaling up training data or model size because the explicit representation directly injects the prior knowledge needed by target tasks into neural networks without any learning. We explore explicit models for point cloud learning to perform robust estimation, stereo matching, segmentation, reconstruction and neural rendering. The thesis is organized into four chapters: 1, ACNe: An optimization-inspired network architecture that allows learning with point clouds contaminated with an abundance of outliers. 2, Canonical Capsules: An equivariant latent representation that consists of pose and pose-invariant features, enabling point cloud auto-encoding in unaligned datasets. 3, NeuralBF: A novel 3D instance proposal generation inspired by traditional bilateral filtering for top-down instance segmentation for 3D point clouds. 4, PointNeRF++: A multi-scale, point-based NeRF architecture, allowing seamless integration of point-based representation with Neural Radiance Fields.Across these four chapters, we show that explicit models largely improve point cloud learning, inspiring more future research in this domain. We conclude with a discussion about future works, practical tips on how to form an explicit model, and its role in the era of large foundation models.
View record
Modularizing deep learning for geometry-aware registration and reconstruction (2023)
In this work, we explore the modularization of deep learning for geometry-aware registration and reconstruction, with a particular focus on cameras registration and human reconstruction from videos. The traditional methods for these tasks have been challenged by deep learning approaches, but end-to-end learning can be limited in terms of generalization, transparency, and controllability. Modularization breaks the task into smaller subtasks and allows each to be addressed individually using traditional methods or deep learning techniques. Through modularization, we are able to embed knowledge from the real world, enabling better generalization, simpler and more effective learning, explainable and transparent models, and geometry-awareness.Specifically, this work consists of four major chapters, each presenting a modularized approach to solve a specific geometric problem. Firstly, a novel linearized multi-sampling method is proposed to enable better image alignment and learning. Secondly, the homography warping is modularized out of the pipeline allowing optimization through the learned error for accurate sports field registration. Thirdly, by modularizing the robust estimation and 3D map from the pose estimation pipeline, the neural network can focus on learning accurate image correspondences. Finally, the modularization of human scene positioning and mesh skinning allows for the reconstruction of animatable human avatar from video.Overall, our work demonstrates the power of modularization, and we hope it will inspire future research on modularization and its potential applications to other areas.
View record
Master's Student Supervision
Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.
Deblurring neural radiance fields by modeling camera imperfections and using RGB-event stereo (2024)
Neural radiance fields (NeRF) have brought progress in rendering photorealistic 3D reconstruction. However, it requires clear images with correct camera poses. To address this problem, we propose to model camera imperfections that arise from the simple pinhole camera model and combine RGB images with event camera data in a stereo setup. Specifically, compared to conventional approaches that enforce physical priors on a camera model, we model measurement variation across the exposure time using embeddings using a data-driven approach. To incorporate event data into the NeRF pipeline, we propose a learnable mapper that bridges the event camera measurement space with that of the RGB camera. To validate our method, we collected our own high-resolution RGB and event stereo dataset. For further validation, we utilize the EVIMOv2 dataset consisting of limited indoor scenes.
View record
Generative spectra modelling for galaxy redshift estimation (2024)
Knowledge of galaxy distance is important for cosmological studies. Recent deep learning-based approaches may not leverage the full potential of the neural network. We propose a generative model to reconstruct 1D electromagnetic spectra with application to estimate astronomical redshift. The generative model is an auto-decoding neural field network. We represent each spectrum as a high-dimensional embedding which is converted to spectra reconstruction by the following decoder. We optimize the decoder in restframe simultaneously with the embedding by maximizing the structural similarity between the reconstructed and the observed spectra. We then train a classifier based on the reconstructed spectra for redshift classification. During inference, we fit the auto-decoder to the test spectra and then use the classifier to estimate redshift. Compared with a regressor, our classification model features a simplified optimization surface. We combine spectroscopic data from the zCOSMOS , the DEIMOS , and the VIPERS surveys as our dataset. We split the data into training and testing data and outperform the baseline by 0.6% on test redshift accuracy.
View record
Weakly-supervised geometry-aware novel view synthesis (2024)
Enabling computers to understand and interpret visual information is crucial for the development of more sophisticated and interactive technologies. Learning structure, such as 3D shape, can help computers understand visual information more effectively. Our model disentangles object structure and appearance in a weakly supervised manner from multiview images within a single category. We extract a 3D pointcloud from images and reconstruct consistent novel views by rendering the pointclouds from different perspectives. Using a much simpler model and far fewer training examples than costly state-of-the-art diffusion models, we can recover 3D structure from single images of objects and quickly reconstruct them from unseen viewpoints. Our findings suggest that understanding 3D constraints of the real world can enhance performance on visual tasks and make models more robust and generalizable to a wider variety of inputs. This approach enables downstream tasks such as pose transfer, spatially-guided conditional image generation, and paves the way for commonsense reasoning. Our work has potential applications in augmented reality and visual effects, with further exploration of the model's capabilities and integration into broader systems for enhanced visual understanding.
View record
Neural fourier filter bank (2023)
We present a novel method to provide efficient and highly detailed reconstructions. Inspired by wavelets, we learn a neural field that decompose the signal both spatially and frequency-wise. We follow the recent grid-based paradigm for spatial decomposition, but unlike existing work, encourage specific frequencies to be stored in each grid via Fourier features encodings. We then apply a multi-layer perceptron with sine activations, taking these Fourier encoded features in at appropriate layers so that higher-frequency components are accumulated on top of lower-frequency components sequentially, which we sum up to form the final output. We demonstrate that our method outperforms the state of the art regarding model compactness and convergence speed on multiple tasks: 2D image fitting, 3D shape reconstruction, and neural radiance fields. Our code is available at https://github.com/ubc-vision/NFFB.
View record
Bootstrapping human optical flow and pose (2022)
In this work, we propose a bootstrapping framework to enhance human optical flow and 3D human pose. We show that, for videos involving humans in scenes, we can improve both the optical flow and the pose estimation quality of humans by considering the two tasks at the same time. Generic optical flow methods perform better on humans when fine-tuned on human-centric scenes showing that the focus should be on humans when the task is human oriented. On the other hand, an overlooked assumption in recent 3D human pose estimation methods is temporal consistency. As such, we make use of existing human pose estimators and optical flow networks and improve their performance by benefitting from each other. In more detail, we optimize the pose and optical flow networks to, at inference time, agree with each other. We show that this results in state-of-the-art performance on the Human 3.6M and 3D Poses in the Wild datasets, as well as a human-related subset of the Sintel dataset, both in terms of pose estimation accuracy and the optical flow accuracy at human joint locations.
View record
Human pose and stride length estimation (2021)
In this thesis, we develop Computer Vision methods for Human body pose and stride length estimation. We first describe a framework for estimating the stride length of a walking subject from video using a multi-view camera setup. We specifically look into its utility in diagnosing Parkinson's disease. We do this using per frame 3D pose estimates and using an analysis of foot movement, we determine the length of the stride. Parkinson's diagnosis partly relies on stride length information; we claim that our method can be helpful in diagnosis. The current practice in the medical field is to estimate stride with complicated and fundamentally flawed sensors as they tend to affect the gait of the subjects using them. A benefit of our method is that cameras are relatively cheap, easily obtainable, and only need to be set up once. We also describe work done in improving the state of the art in human pose estimation. We first propose a pose refinement method that enhances state-of-the-art methods. Through analysis of our refiner, we show a flaw inherent in the human body model---the inaccuracy in the typical shape-to-pose regressor (joint regressor)---for a standard human pose dataset and show that the results on the top methods are actually being underreported. This flaw results in a situation where the ground truth joints are unsatisfiable with biologically plausible poses. We then address this flaw by modifying a part of the human body model. We reevaluate top state-of-the-art methods and show these models perform better with this modification without retraining.
View record
If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.
Membership Status
Program Affiliations
Academic Unit(s)