Research Classification
Research Interests
Relevant Thesis-Based Degree Programs
Affiliations to Research Centres, Institutes & Clusters
Recruitment
Complete these steps before you reach out to a faculty member!
Check requirements
- Familiarize yourself with program requirements. You want to learn as much as possible from the information available to you before you reach out to a faculty member. Be sure to visit the graduate degree program listing and program-specific websites.
- Check whether the program requires you to seek commitment from a supervisor prior to submitting an application. For some programs this is an essential step while others match successful applicants with faculty members within the first year of study. This is either indicated in the program profile under "Admission Information & Requirements" - "Prepare Application" - "Supervision" or on the program website.
Focus your search
- Identify specific faculty members who are conducting research in your specific area of interest.
- Establish that your research interests align with the faculty member’s research interests.
- Read up on the faculty members in the program and the research being conducted in the department.
- Familiarize yourself with their work, read their recent publications and past theses/dissertations that they supervised. Be certain that their research is indeed what you are hoping to study.
Make a good impression
- Compose an error-free and grammatically correct email addressed to your specifically targeted faculty member, and remember to use their correct titles.
- Do not send non-specific, mass emails to everyone in the department hoping for a match.
- Address the faculty members by name. Your contact should be genuine rather than generic.
- Include a brief outline of your academic background, why you are interested in working with the faculty member, and what experience you could bring to the department. The supervision enquiry form guides you with targeted questions. Ensure to craft compelling answers to these questions.
- Highlight your achievements and why you are a top student. Faculty members receive dozens of requests from prospective students and you may have less than 30 seconds to pique someone’s interest.
- Demonstrate that you are familiar with their research:
- Convey the specific ways you are a good fit for the program.
- Convey the specific ways the program/lab/faculty member is a good fit for the research you are interested in/already conducting.
- Be enthusiastic, but don’t overdo it.
Attend an information session
G+PS regularly provides virtual sessions that focus on admission requirements and procedures and tips how to improve your application.
ADVICE AND INSIGHTS FROM UBC FACULTY ON REACHING OUT TO SUPERVISORS
These videos contain some general advice from faculty across UBC on finding and reaching out to a potential thesis supervisor.
Graduate Student Supervision
Doctoral Student Supervision
Dissertations completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest dissertations.
Probabilistic modeling of high-throughput sequencing data for enhanced understanding of DNA methylation heterogeneity (2025)
DNA methylation is a key epigenetic mechanism governing gene regulation and cellular identity. Advances in high-throughput sequencing technologies have enabled detailed investigation of methylation landscapes across single cells and complex tissue mixtures. However, the sparsity and noise inherent in single-cell data, as well as the signal distortion in enrichment-based platforms, pose major analytical challenges. This thesis presents two novel statistical frameworks to address these limitations and advance the computational toolkit for DNA methylation analysis.The first contribution is vmrseq, a probabilistic method and software for detecting variably methylated regions from single-cell bisulfite sequencing data. vmrseq integrates a smoothing-based strategy for candidate region identification with hidden Markov modeling to account for spatial correlation and technical noise. Through extensive benchmarking on synthetic and experimental datasets, vmrseq demonstrates improved precision and biological relevance in identifying methylation heterogeneity, supporting downstream analyses such as unsupervised clustering and cell-type-specific marker discovery.The second contribution is decemedip, a hierarchical Bayesian model and software for cell type deconvolution of enrichment-based methylation data such as MeDIP-seq. By leveraging reference panels derived from alternative platforms and modeling the complex relationship between methylation levels, CpG density, and read counts, decemedip enables accurate estimation of cell type proportions with uncertainty quantification. Its performance is validated through simulations, cross-platform comparisons, and real-world applications involving patient-derived xenografts and circulating cell-free DNA from cancer cohorts.Together, these methods address critical gaps in the analysis of high-throughput DNA methylation data, enabling robust detection of epigenetic heterogeneity across biological contexts. The associated open-source software implementations provide practical tools for future epigenomic research and potential clinical applications.
View record
Master's Student Supervision
Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.
Predicting sustained remission and maximal disease severity in pediatric crohn's disease using machine learning (2026)
Crohn's disease (CD) is a chronic inflammatory condition affecting the gastrointestinal tract, and displays a growing prevalence in children. In affected children, it displays more heterogeneous disease trajectories and treatment responses than adult-onset cases, posing significant management challenges. While early aggressive treatment may benefit patients with severe trajectories, no objective method exists to identify the high-risk children at diagnosis. This prognostic gap forces reliance on subjective clinical judgment, potentially delaying critical interventions. This study aimed to use machine learning models to predict two first-year outcomes in the Canadian Children IBD Network inception cohort: 1) sustained remission vs non-sustained remission, defined as maintaining a post-remission Weighted Pediatric Crohn's Disease Activity Index (wPCDAI) 12.5 without inflammatory episodes, and 2) maximal disease severity (remission/mild [post-diagnosis wPCDAI 40, indicating minimal inflammatory activity] vs moderate/severe [wPCDAI ≥40, indicating substantial inflammation and need for treatment escalation]). Nine algorithms were trained on baseline clinical, microbiome, and integrated clinical-microbiome datasets using repeated nested 3-fold cross-validation, with the minimum redundancy maximal relevance feature selection, Bayesian hyperparameter optimization, and SHAP for model explainability. For sustained remission prediction, integrated models outperformed microbiome- or clinical-only models, with integrated logistic regression achieving the highest mean AUC (0.763); key features included initial treatment at diagnosis, disease location, and wPCDAI at diagnosis, as well as taxa known to play a role in CD such as Haemophilus and Lachnospiraceae. For maximal disease severity prediction, microbiome models performed best, with Gaussian naïve Bayes reaching a mean AUC of 0.801 and highlighting microbes such as Clostridium and Veillonella as predictors of severe disease, while taxa such as Coprococcus and Romboutsia were associated with milder disease. Bayesian decision curve analysis of our top-performing models also demonstrated likely clinical utility at relevant decision thresholds. Our results suggest the potential of integrated machine learning approaches to support clinical decision-making in pediatric Crohn's disease. By enabling early identification of high-risk patients at diagnosis, this work paves the way for personalized treatment strategies that could improve long-term outcomes in this vulnerable population.
View record
Application of supervised learning models to compare epigenetic predictors of gene expression across healthy breast cell types (2024)
Moderate associations have been identified between gene expression and DNA methylation variability, predicted transcription factor binding sites, and transcription factor expression across multiple human tissues, including healthy mammary cells and diverse cancer-related cellular contexts. However, previous models summarized DNA methylation primarily at promoter regions, ignoring methylation variability in other genomic regions. In the current thesis, I propose using Variably Methylated Regions (VMRs) for summarizing DNA methylation and hypothesized that models trained on VMR-derived features would outperform promoter-centered models in the prediction of individual gene expression across healthy mammary cell types. Results largely supported this hypothesis, with VMR-based models demonstrating a superior capacity for predicting standardized individual gene expression across held-out samples compared to their promoter counterparts. Additionally, the DNA methylation feature showed the highest contribution to the performance of VMR-based models. Despite challenges in generalizing association patterns to unseen data across all regression models, this thesis is the first study that uses and rigorously evaluates the contribution of VMR-derived features to explain gene expression variability across healthy mammary cell types.
View record
A new data driven framework for simulating mendelian randomization data (2023)
Mendelian randomization (MR) is a causal inference method that allows biostatisticians to leverage DNA measurements to study causal effects with only observed data. Recent advancements including two-sample summary-level mendelian randomization (TSSLMR) and the data source IEU OpenGWAS database have lowered the barrier for conducting MR studies and opened the opportunity to mine causal effects. In the first part of the thesis, I show that there is a mismatch between the characteristics of modern TSSLMR data and how articles that propose popular TSSLMR models conduct their simulations. Next, I propose my solution: a data driven simulation framework for MR data that aims to be realistic, interpretable and easy to use thanks to a complementary R package implementation. As for the results, I show that models perform far better in literature-based simulations compared to more realistic simulations based on my proposed framework. Lastly, I warn that the mismatch between simulated and real data along with the obtained results may lead researchers to have over optimistic expectations about models performance in real applications.
View record
Evaluating omics-based tests with Bayesian Decision Curve Analysis (2023)
Omics-based tests (OBTs) combine high-dimensional omics features into clinical prediction modelsthat predict diagnosis, prognosis, or treatment effects. Past incidences of premature implementa-tion of OBTs into clinical trials have demonstrated the need for increased rigour in their clinicalevaluation. However, their performance assessment is often limited to classification metrics such assensitivity and specificity, with little regard for formal analysis of clinical decision-making. Decisioncurve analysis (DCA) complements classification metrics by combining classical assessment of pre-dictive performance with the consequences of using a test or model to guide clinical decisions. InDCA, the best clinical decision strategy, such as diagnosing or treating based on an OBT, is the onethat maximizes the concept of net benefit: the net number of true positives (or negatives) providedby a given clinical decision strategy. Before reaching real patients, we must be sufficiently confi-dent that new OBTs actually provide superior clinical decision strategies, as compared to default,standard-of-care strategies. Trained on hundreds to thousands of features, OBTs are particularlyprone to chance results. In this context, the present work develops parametric Bayesian approachesto DCA that allow uncertainty quantification around four fundamental concerns when evaluatingOBT-guided clinical decision strategies: (i) which strategies are clinically useful, (ii) what is thebest available decision strategy, (iii) direct pairwise comparisons between strategies, and (iv) whatis the consequence of the current level of uncertainty. We evaluate the methods using simulationstudies and present a comprehensive case study. We also provide an application to a recently-developed OBT for multi-cancer early detection. Software implementation of the method is freelyavailable in the bayesDCA R package. Ultimately, the Bayesian DCA workflow may help cliniciansand health policymakers make better-informed decisions when choosing and implementing clinicaldecision strategies based on OBTs.
View record
If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.
Membership Status
Program Affiliations
Academic Unit(s)