Relevant Thesis-Based Degree Programs
Affiliations to Research Centres, Institutes & Clusters
Graduate Student Supervision
Master's Student Supervision
Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.
Recent advances in single cell sequencing has led to the availability of gene expression data. We model the trajectories of cells in gene expression space using a diffusion and gradient drift process that also undergoes branching. We recover the law on paths of this process given time marginals using an entropy minimization problem. The optimization is over the set of all processes with the same branching mechanism and given marginals. We prove that among these the one which minimizes the relative entropy with respect to a reference branching Brownian motion is the ground truth. The above result has been proved earlier for models that don't consider branching. This thesis extends it for the case with branching.
Understanding the dynamics of embryonic development is crucial to finding treatments forconditions such as aging and cancer. The development of an embryo can be represented as acurve in the Wasserstein space, and to construct this curve, static snapshots of gene expressionprofiles are obtained at n selected time points. Since the measurement techniques for obtainingthese snapshots are destructive, we have to infer the developmental trajectory using a series ofstatic snapshots of gene expression profiles taken at different time points t₁, t₂,...tₙ. To obtainthese snapshots, multiple embryos are allowed to develop until each of the desired time pointsis reached, and the gene expression profile is then captured. However, to reconstruct the curvewe need to know which embryo had reached which developmental stage; this information is lostduring the measurements. To overcome this, a pairwise similarity function between profiles canbe defined, and the profiles can be arranged so that the more similar they are, the closer theyare placed together. This is part of a larger class of problems known as the “seriation” problem.In this thesis, the feasibility of using the “spectral seriation” method proposed by Atkins et al.is investigated to recover the order of the profiles based on their similarity, which enablesthe construction of the curve.The gene expression profile of an embryo can be seen as a probability measure on a compactset. Although the exact measures are unknown, they can be approximated empiricallyusing m samples. In this thesis, we demonstrate that, under reasonable assumptions and withsufficient time points and samples per time point, the spectral seriation method can be effective insequencing the data. Additionally, we provide tools to determine the number of time points andsamples per time point needed to achieve a desired error bound. Furthermore, we investigatehow the geometric properties of the curve representing the embryonic development can affectour ability to sequence the data.
Spatial transcriptomics goes one step beyond single-cell RNA sequencing, yielding high-dimensional images of gene expression in a tissue, which offer the prospect of understanding cell signaling. Recent breakthroughs in spatial transcriptomics have drastically increased the area of tissue which can be profiled. However, sequencing all RNA molecules over large spatial areas is prohibitively expensive. One would like to reduce the number of RNAs sequenced, but this results in excessive technical noise in the measured gene expression data. To counter this problem, we develop theory and algorithms for spatial transcriptomics denoising, based on low rank matrix recovery and spatial smoothing. We propose two novel procedures for estimating the true underlying gene expression image: (1) a low rank maximum-likelihood-type estimator with graph-based total variation regularization, and (2) a switching procedure that switches between 'risky' estimators that work well in practice, and 'safe' estimators which have well-known properties. Our methods are backed by theoretical recovery guarantees, as well as tests on real data which suggest that it is possible to reduce the number of RNAs sequenced by more than 10-fold, without significantly increasing recovery error. Finally, as a generalization of the analysis employed above, we establish some convergence rates for the estimation of structured discrete probability distributions.
Developmental trajectory inference is the task of estimating the paths followed by cells over time as they develop (divide, die and differentiate) in a biological population. In this work we consider the problem of inferring developmental trajectories at single-cell resolution from time courses of dynamic populations which contain observations of cell developmental state, as such gene expression from single-cell RNA sequencing (scRNA-seq), and shared ancestry through lineage tracing. We focus on the setting in which shared ancestry data is obtained from static DNA barcodes, such as those inserted via lentiviral vectors, observed over multiple time points. DNA barcode data allows us to cluster cells across the time course into clones (cells with a common ancestor at some earlier time), and hence they are referred to as multi-time clonal barcodes. Our research reveals that in populations with heterogeneous growth rates, sampling can induce a bias in the cell type proportions represented in the multi-time clonal barcodes. We prove the existence of this effect by simple analysis of probabilities, and validate our arguments with proportions from simulations. Furthermore, we show using simulated data that it is possible for this bias to impact fate probability predictions from state-of-the-art methods for trajectory inference which incorporate multi-time clonal barcode information and cell state. There is only one current method in the literature, CoSpar , of this type. However in our research we have also developed an extension of another method, LineageOT , which uses only cell state and clonal barcode information from single time points, adapting the method to use both single- and multi-time clonal barcode data. Though this extension improves the performance of the original method, evaluated on simulated data, we find that the performance gains may be impacted by the bias effect we have uncovered. Given the potential for application of trajectory inference results to biomedical technologies and treatments, understanding and improving the accuracy of these methods is crucial. Our goal for these contributions is to inform researchers of this bias and stimulate the development of methods related to reducing its impact on trajectory inference.