Doctor of Philosophy in Chemistry (PhD)
Liquid chromatography-Mass Spectromety-Based Untargeted Metabolomics for Single Cell Profiling
Our research interests include:
G+PS regularly provides virtual sessions that focus on admission requirements and procedures and tips how to improve your application.
These videos contain some general advice from faculty across UBC on finding and reaching out to a potential thesis supervisor.
Dissertations completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest dissertations.
Untargeted metabolomics studies the complete set of small metabolic molecules in a given biological system. Liquid chromatography coupled with high-resolution mass spectrometry (LC-HRMS) is currently the most prominent analytical platform for untargeted metabolomics owing to its high sensitivity, specificity, and metabolic coverage. However, the current LC-MS-based untargeted metabolomics workflow has limited performance in detecting and quantifying trace-level metabolites of bad chromatographic peak shapes. It is also hard to differentiate signals of real metabolites from noise and background. During my Ph.D., I have developed a suite of analytical and bioinformatic tools to address the critical challenges in metabolomics data acquisition and data processing. In this thesis, Chapters 2 to 4 focus on the development of data acquisition methods. Specifically, Chapters 2 to 3 describe the detailed comparison of the existing data acquisition modes in two different aspects. Chapter 4 describes the development of a novel data acquisition strategy, DaDIA, to increase metabolomic coverage. On the other hand, Chapters 5 to 9 describe the development of data processing methods. In Chapter 5, a data processing parameter optimization tool, Paramounter is introduced to rapidly and accurately determine the best peak-picking parameters for five commonly used data processing programs. In Chapter 6, the five most commonly used metabolomics data processing programs were compared to mechanistically explain the difference regarding the performance in metabolic feature extraction. In Chapter 7, a novel data processing program, JPA, was developed to efficiently extract the metabolic features by combining multiple peak-picking algorithms. In Chapter 8, a deep learning-based software, EVA, was created to automatically remove the false positive features generated from the background noise. In Chapter 9, I developed a bioinformatic workflow, ISFrag, to automatically recognize and remove the false positive metabolic features originating from in-source fragmentation.
Quantitative determination of metabolite concentrations in biological samples is fundamental to biological and clinical research. Metabolomics analyzes the entire set of metabolites in a given biological system. It is an emerging technology in the post-genomic era to interrogate cellular biochemistry, perform diagnostic testing, stratify patient populations, and characterize biochemical mechanisms of disease. Recent successes in metabolomics demonstrate the central role of mass spectrometry (MS) in small molecule quantification, owing to its high sensitivity, high throughput, and broad metabolic coverage. Even though diverse MS instruments have been developed for metabolite quantification, it is still challenging to quantify the entire metabolome accurately and precisely. Besides MS hardware advances, quantitative metabolomics also requires extensive efforts in other analytical and bioinformatic methodology development. For a given MS platform, analytical method development focuses on laboratory practice, including sample handling, metabolome extraction, and data acquisition. In comparison, bioinformatic method development emphasizes computational data processing, such as data calibration, data curation, and statistical analysis. The subsequent chapters detail the development of analytical and bioinformatic solutions for quantitative metabolomics from improving metabolic coverage, analytical accuracy, analytical precision, and statistical analysis. Lastly, this thesis describes a metabolomics study of mouse brain regional differences in metabolism between males and females. Collectively my studies of quantitative metabolomics improve quantitative performance, deepen our knowledge of the MS-based quantification process, and facilitate the generation of confident biological conclusions.
Metabolomics is an emerging omics study that aims to characterize the entire metabolome in a biological system. Mass spectrometry (MS) is a preferred analytical technique for metabolomics research owing to its high sensitivity and highly specific structural information content. However, it remains a longstanding challenge to accurately translate MS signals into chemical language, thus hindering the downstream biological interpretation.This dissertation presents computational strategies contributing to tandem mass (MS/MS) spectral interpretations with the aid of machine learning and statistical approaches. Chapter 1 provides a holistic introduction to MS-based metabolomics and the developed bioinformatic tools for uncovering the unidentified metabolic features in untargeted metabolomics. Chapter 2 describes a novel MS/MS spectral comparison algorithm, Core Structure-based Search (CSS), which searches for structural analogs of unknown MS/MS spectra within the existing MS/MS reference libraries. CSS shows improved correlations with structural similarity in large-scale benchmarking. In Chapter 3, a deep learning-based tool is developed for automated extraction of steroid-like metabolic features from the untargeted metabolomics data by classifying MS/MS fragmentation patterns. This biology-driven metabolomics pipeline enables metabolite characterization and discovery on the compound class level. Chapter 4 depicts the purification of chimeric MS/MS spectra using a random forest model. Purified MS/MS spectra are demonstrated to yield better spectral matching results against MS/MS reference libraries. Chapter 5 describes the systematic analysis of radical fragment ions in MS/MS through MS/MS database mining. Larger than expected percentages of radical ions are present in collision- induced dissociation-based MS/MS; relationships between radical ion percentages and compound classes, chemical substructures and collision energies are also investigated. Chapter 6 discusses a standalone platform, BUDDY, for molecular formula discovery via bottom-up MS/MS interrogation and experiment-specific global peak annotation. BUDDY further integrates machine-learned ranking and significance control, showing improved formula annotation accuracy and lower computational cost than other benchmarking tools. Applying BUDDY on repository- scale recurrent unidentified MS/MS spectra, we discovered >5,000 chemical database-unarchived molecular formulae with high confidence. Overall, this dissertation demonstrates computational contributions to enriching structural insights into MS-based untargeted metabolomics data, thus paving the way for understanding biological mechanisms behind various health disorders and diseases from the perspective of small molecules.
Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.
This thesis delves into the development and application of metabolomics, a discipline focused on the comprehensive study of metabolites within biological systems. The research is segmented into two interconnected parts: methodology development for metabolomics data processing and its subsequent application in plant stress physiology. The first project tackles the challenge of computational variation in untargeted metabolomics, which arises due to the incapability of data processing for complex LC-MS data. An in-depth exploration led to the identification of sources and causes of computational variation, followed by the development of novel methodologies to mitigate these challenges. These methodologies, including data processing parameter optimization and a machine learning program, successfully reduced computational variation, thereby enhancing the quantitative precision of untargeted metabolomics. The second segment of the thesis applies these methodologies to study the salinity stress response in Alfalfa (Medicago sativa L.). Comprehensive analysis of the plant's metabolic alterations, coupled with transcriptomics data, revealed significant pathways and mechanisms of salinity response. The integration of multi-omics data provided a deeper understanding of the complex interplay between genes and metabolites. The research advances the field of metabolomics, providing improved data processing methodology and valuable insights into plant stress physiology. Future work may expand these findings towards personalized medicine, disease diagnosis, and precision treatment.
The full abstract for this thesis is available in the body of the thesis, and will be available when the embargo expires.
As the most recently emerged “omics”, metabolomics grabbed attention in human health studies by measuring thousands of small-molecule metabolites in a wide range of biological samples. As the downstream products in the biological pathway, metabolites are regarded as the closest link to the phenotypes. Small stimuli in the human body will cause relatively huge changes in the level of metabolites. Liquid chromatography-mass spectrometry (LC-MS) is the mainstay in metabolomics research due to its high throughput, sensitivity, and reliable analysis of metabolites. Nevertheless, two of the main challenges in LC-MS based metabolomics are 1) how to apply metabolomics in studying human health and 2) apart from commonly used biological samples, including serum, plasma, and urine, how to develop a methodology of new biological samples that can be adapted to specific human health research. To address those challenges, in Chapter 2, I integrated metabolomics with metagenomics to examine human gut health. 13-species metagenomic signature was selected by random forest machine learning and achieved high diagnostic accuracy in differentiating hepatic decompensation in NAFLD-related cirrhosis. The signature was cross-validated by metabolomics. 32 metabolites and 15 metabolites from serum and feces, respectively, were found to be significantly linked to 13-discriminatory species, suggesting that the identified discriminatory species may play important roles in the progression from compensated to decompensated cirrhosis. This multi-omics study yields new avenues for identifying novel targets for therapy and microbial biomarkers of hepatic decompensation, a worldwide human disease. In Chapter 3, I integrated plasma metabolomics and proteomics to examine the health conditions of highly trained females and males following acute, severe-intensity exercise. Metabolomic and proteomic homeostasis were substantially perturbed. Through statistical analysis, some metabolites and proteins were found to be closely linked to high-intensity exercise. This multi-omics study was a powerful tool to study molecular responses to acute exercise and provided a new insight to exercise-bolstered human health. In Chapter 4, I developed a new methodology to track skin secretion. Our high-performance workflow was readily applied to a wide range of skin metabolomics research to gain a better understanding of the molecular signatures on skin that link to human health and disease.
For this work, two areas of metabolomics were investigated relating to the fundamentals of the field and application to different experiments. The first chapter was an assessment and comparison of the MSMS spectra generated from different acquisition modes. The chosen acquisition modes were data-dependant acquisition (DDA), data-independent acquisition (DIA), and enhanced insource-fragmentation (eISF) at a range of collision energies. The data was obtained by performing untargeted metabolomics on a urine sample and a standard mixture solution through a LC-MS platform while also covering multiple ionization modes. The spectra from the three modes were compared against each other through several factors that relate to the various ways MSMS spectra are used in a metabolomics workflow. These comparisons involved investigating the spectral purity, quality of reference matching results, structural similarity, and de novo annotation performance. It was found that DDA performed the best with eISF and DIA following. It was seen that eISF performed on-par or slightly better than DIA at higher collision energies. This indicates that the collision energy used will have a notable impact on the performance of the mode. The second chapter involves the metabolomics of pancreatic cell samples. The purpose of this was to determine the metabolic profile between the control and treated groups. The control was regular cancer cells from the MiaPaCa2 cell line while the treated groups had specific genes knocked out. The investigation was performed to gain insight into which metabolic pathways the knocked-out genes were involved in. Using a LC-MS platform it was found that 12 metabolites showed significant intensity differences between the groups. A literature review of these compounds highlighted possible metabolomic pathways affected such as polyamine metabolism. The last chapter focuses on a lipidomics experiment that was performed on the bacteria Thermotoga maritima to investigate the lipid content of the bacterial membranes. The samples relating to each fraction were run through the same LC-MS platform as above. It was seen that there were three significantly different lipids apart from the fatty acid, phosphatidylethanolamine, and phosphatidylinositol lipid classes. These classes have all been shown to be involved in membrane stability and transport.