Doctor of Philosophy in Computer Science (PhD)
Design and complexity analysis of novel algorithms for annotation-independent detection of transcriptomic alternative splicing Isoforms using long-read sequencing
G+PS regularly provides virtual sessions that focus on admission requirements and procedures and tips how to improve your application.
These videos contain some general advice from faculty across UBC on finding and reaching out to a potential thesis supervisor.
Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.
The full abstract for this thesis is available in the body of the thesis, and will be available when the embargo expires.
BACKGROUND: Non-obstructive azoospermia (NOA) is defined by the complete absence of spermatozoa in repeated ejaculate samples. The vast majority of cases remain unexplained, hindering development of effective treatments. Existing studies mostly employed whole genome/exome and bulk RNA-sequencing, while evaluation of testicular cell-type specific abnormalities in NOA would lay a promising foundation to advance therapeutics research.OBJECTIVES: 1. To systematically review all available reports specifying cell-type-specific deviations in the testes of NOA males. 2. To investigate intratesticular heterogeneity in an idiopathic NOA case and elaborate on its cell-type-specific abnormalities using single-cell RNA sequencing (scRNA-seq). METHODS: For the systematic review, eligibility criteria comprised human studies published in English language specifying cell types of the reported abnormalities in NOA testes. All MEDLINE sources were searched. For the scRNA-seq project, testicular tissue specimens were obtained from an idiopathic NOA patient with intratesticular heterogeneity and a fertile male undergoing vasectomy reversal. Cell types were identified based on the scRNA-seq data and marker gene expression. Transcriptome dissimilarity, developmental progression, pathway activity, and cell-cell communication among the corresponding cell types of NOA and normal control were statistically assessed. Immunofluorescence and immunohistochemistry were used to evaluate cellular composition and developmental markers.RESULTS: The systematic review summarized a diverse range of cellular dysfunctions in most testis cell types in NOA. Reported abnormalities included DNA damage response in Sertoli cells, dysregulation of steroidogenic pathways in Leydig cells; myoid cell-related tubule wall thickening; increase in immune cells; increased apoptosis and potential changes in spindle formation in germ cells, and more. ScRNA-seq project identified hypospematogenesis and Sertoli-Cell Only phenotypes within the NOA testis; immaturity of Leydig and Sertoli cells expressing early developmental markers, e.g., MAFB; higher dissimilarity between somatic cells of the two NOA phenotypes than between those of individual states and normal control; HS germline developmental deviations, especially at the later spermatogenesis stages; significant increase in macrophages and activated T cells and enlarged interstitium in both phenotypes.CONCLUSIONS: Abnormalities in both somatic and germ cells are present in NOA testes, and they vary from patient to patient. A precision medicine approach may be necessary for treatment of NOA.
Two decades after the initial sequencing and assembly of the human genome, the current reference assembly is still not sufficiently representative. Although most efforts to enrich our understanding of genomic variations have focused on single nucleotide polymorphisms, recent studies led by the Human Genome Structural Variation Consortium and the Genome in a Bottle Project aim to characterize struc- tural variations. Still, there are genomic sequences missing from assemblies. These sequences, termed novel sequence insertions, need to be discovered to better char- acterize human genome diversity. Furthermore, insertions discovered to date have been shown to harbor coding genes and other functional elements. Studies have proven the link between the existence of these insertions and the emergence of certain diseases. Current methods and tools developed for novel sequence inser- tion discovery suffer from shortcomings such as mapping ambiguity and assembly fragmentation, and especially lag in detecting long insertions. Unlike short-read sequencing, long-read sequencing has a higher basepair error rate, but is less prone to mapping ambiguity due to short repeats. Moreover, they can achieve longer more contiguous segments, resolving common assembly issues. On the other hand, short-reads are almost error-free, therefore could provide better breakpoint prediction at basepair level. Utilizing the complementary characteristics of both technologies, we introduced a novel algorithm RinsLR that discovers mid-range (50bp-10kbp) novel sequence insertions. RinsLR uses short-reads to accurately identify potential novel sequence insertion breakpoints. It then uses long-reads to rebuild and retrieve the inserted sequence. Using simulated experiments and the T2TCHM13 genome, we evaluated RinsLR, compared it against other tools, and showed that RinsLR achieves consistently high precision and recall rates.
The most severe type of male infertility, known as non-obstructive azoospermia (NOA), is characterized by the absence of sperm in the ejaculate due to spermatogenesis failure. Of all NOA patients, only ~30% are ultimately given a precise diagnosis, leaving the vast majority with no clear explanation for their infertility. Genome/exome-wide studies identified a range of candidate genes responsible for proper sperm formation and testis development. However, bulk sequencing studies have limitations in studying tissue with numerous cell types. Specifically, this is problematic when investigating cellular dysfunction in the testis tissue, whether it be normal control tissue or idiopathic infertility. Alternatively, single-cell RNA sequencing (scRNA-seq) provides gene expression profiles at the level of individual cells and can overcome the limitations of bulk RNA sequencing in this regard. Gene expression data within cells are useful in providing insight into cellular programs and intrinsic biological processes. However, single genes can be comprised of various isoforms that differ in the order and composition of their respective exons, which have significant biological implications in subsequent protein translation and function. Regardless of bulk or single cell, short-read RNA-sequencing technology has limitations in revealing the gene structure, i.e., isoforms. On the other hand, long-read RNA-seq makes it possible to determine the sequence of exons in the transcript isoforms. Recently, new library preparation techniques in single-cell workflows have been developed to take advantage of both long-read RNA sequencing and short-read scRNA-seq libraries. However, computational identification of originating cells of the resultant long-reads using the short-read generated single cell barcodes is not a trivial task because of the relatively high error rate in long-reads. The human testis has been shown to have the highest RNA splicing compared to other tissues. A previous study investigating bulk proteomics of NOA tissue identified RNA splicing machinery as being among the most dysregulated compared to healthy controls. Therefore, we hypothesize that characterizing RNA isoforms on a single cell level will provide critical biologic insights to normal human spermatogenesis as well as pathologic underpinnings of dysregulated spermatogenesis present in NOA.