Wyeth Wasserman

Professor

Relevant Degree Programs

 

Graduate Student Supervision

Doctoral Student Supervision (Jan 2008 - May 2019)
Development of human-computer interactive approaches for rare disease genomics (2019)

No abstract available.

Revealing the impact of sequence variants on transcription factor binding and gene expression (2017)

Transcription factors (TFs) can bind to specific regulatory regions to control the expression of target genes. Disruption of TF binding is regarded as one of the key mechanisms by which regulatory variants could act to cause disease. However predicting the functional impact of variants on TF binding remains a major challenge for the field, standing as a key obstacle to achieving the potential of clinical genome analysis. This thesis confronts this challenge from a bioinformatics perspective and addresses two unresolved problems. The first problem is the determination of which genetic variants alter TF binding. Only a small number of allele-specific binding (ASB) events, in which TFs preferentially bind to one of two alleles at heterozygous sites in the genome, have been determined. To study the impact of variants on TF binding, access to a large, gold standard collection of ASB events could facilitate the development of new predictive methods. In Chapter 2, we implemented a pipeline to identify ASB events from ChIP-seq data and applied it to produce one of the largest ASB datasets. We found that ASB events were associated with allelic alterations of TF motifs, chromatin accessibility and histone modifications. Using the available features, classifiers were trained to predict the impact of variants on TF binding. To improve ASB calling, Chapter 3 evaluated five statistical methods, ultimately supporting a method that pooled ChIP-seq replicates and utilized a binomial distribution to model allelic read counts.The second problem is to determine how altered TF binding events impact the expression of target genes. In Chapter 4, we implemented regression-based models to predict gene expression changes based on altered TF binding events across 358 individuals. The models showed predictive capacity for 19.2% of genes, and the key TF binding events in the model provided mechanistic insights as to how these regulatory variants alter gene expression.In summary, this thesis both generated the largest, high-quality collection of ASB events, and developed algorithms to predict variant impact on TF binding and gene expression. The presented work advances the capacity of the field to interpret regulatory variants and will facilitate future clinical genome analysis.

View record

Development and evaluation of software for applied clinical genomics (2016)

High-throughput next-generation DNA sequencing has evolved rapidly over the past 20 years. The Human Genome Project published its first draft of the human genome in 2000 at an enormous cost of 3 billion dollars, and was an international collaborative effort that spanned more than a decade. Subsequent technological innovations have decreased that cost by six orders of magnitude down to a thousand dollars, while throughput has increased by over 100 times to a current delivery of gigabase of data per run. In bioinformatics, significant efforts to capitalize on the new capacities have produced software for the identification of deviations from the reference sequence, including single nucleotide variants, short insertions/deletions, and more complex chromosomal characteristics such as copy number variations and translocations. Clinically, hospitals are starting to incorporate sequencing technology as part of exploratory projects to discover underlying causes of diseases with suspected genetic etiology, and to provide personalized clinical decision support based on patients’ genetic predispositions. As with any new large-scale data, a need has emerged for mechanisms to translate knowledge from computationally oriented informatics specialists to the clinically oriented users who interact with it. In the genomics field, the complexity of the data, combined with the gap in perspectives and skills between computational biologists and clinicians, present an unsolved grand challenge for bioinformaticians to translate patient genomic information to facilitate clinical decision-making. This doctoral thesis focuses on a comparative design analysis of clinical decision support systems and prototypes interacting with patient genomes under various sectors of healthcare to ultimately improve the treatment and well-being of patients. Through a combination of usability methodologies across multiple distinct clinical user groups, the thesis highlights reoccurring domain-specific challenges and introduces ways to overcome the roadblocks for translation of next-generation sequencing from research laboratory to a multidisciplinary hospital environment. To improve the interpretation efficiency of patient genomes and informed by the design analysis findings, a novel computational approach to prioritize exome variants based on automated appraisal of patient phenotypes is introduced. Finally, the thesis research incorporates applied genome analysis via clinical collaborations to inform interface design and enable mastery of genome analysis.

View record

Improving the detection of transcription factor binding regions (2015)

The identification of non-coding regulatory elements in the genome has been the focus of much experimental and computational effort. However, both experimental data, such as ChIP-seq, and computational methods of transcription factor (TF) binding predictions suffer from a degree of non-specificity. ChIP-seq experiments report regions that don’t contain the expected canonical motif for the ChIPped TF, which may arise from indirect binding or a non-TF-specific mechanism. Computational predictions based on sequence-level information alone are plagued by false positives. This thesis explores computational approaches to improve both the interpretation of large-scale TF binding data, and the detection of TF binding regions.In Chapters 2 and 3 we observe that experimentally defined regulatory regions of the human genome are a mixture of sub-groups reflecting distinct properties. On average a third of a ChIP-seq dataset does not contain the targeted TF’s motif, and within this subset up to 45% of the ChIP-seq peaks are unexpectedly enriched for a small class of non-targeted TFs’ motifs. Many of these regions are not specific to a TF but are ChIPped by multiple diverse TFs across multiple cell types. These recurring regions tend to be the lower scoring peaks of a dataset, are less likely to reproduce between experimental replicates, and tend to associate with cohesin and polycomb protein occupied positions in the genome. The regulatory regions with a greater specificity for a TF do not share these properties. Based on these observations we suggest a TF ‘loading-zone’ model to account for the presence of the aforementioned recurrent regions in ChIP-seq data. In Chapter 4 we further explore the regulatory region subgroups with a biophysical simulator of TF occupancy (tfOS). Within tfOS we have incorporated TF-DNA interaction energies, TF search mechanics, cooperative TF interactions, and sequence accessibility data into the model. Simulations with tfOS across sequences reveal distinct features associated with recurrent and non-recurrent regions described in Chapter 3. The research presented has improved our understanding and interpretation of large-scale TF binding data and advanced our understanding of TF regulatory regions, leading to improved annotation and interpretation of the human genome. Supplementary video material is available at: http://hdl.handle.net/2429/51447

View record

Inferring novel relationships through over-representation analysis of medical subjects in biomedical bibliographies (2012)

MEDLINE®/PubMed® is a richly annotated resource of over 21 million article citations, growing at a modern rate of over 600,000 citations annually. One grand challenge of bioinformatics is analysing the extensive literature for a biomedical entity such as a gene or disease. This thesis explores using over-representation to extract pertinent biomedical annotation from the research articles for an entity. The quantitative profiles generated are compared to predict novel associations between entities.Medical Subject Heading Over-representation Profiles (MeSHOPs) are constructed from the primary literature of an entity of interest. Medical subject annotations for each article are extracted. Statistical tests evaluate the significance of each term’s frequency across the set of articles, compared against an appropriate background set. The resulting MeSHOP is composed of each term and corresponding enrichment p-value. MeSHOPs can be computed for any entity with an associated bibliography of PubMed articles. We evaluate the predictive performance of quantitatively comparing MeSHOPs to discover novel associations between gene and disease entities, achieving up to 16% improvement in accuracy compared to gene or disease baseline features (measured as increased Receiver Operating Characteristic Area Under the Curve). Strong literature annotation level bias on the predictive performance for future gene-disease association was seen. We observe similar results in a parallel analysis of associations between drugs and disease.Efficiently identifying authors with similar research interests is a challenge in science. During the peer review process, authors seek scientists with similar expertise. MeSHOPs are generated for individual authors, identifying their research foci. Extending the methods to allow comparison across large sets of entities, overlapping research interests between researchers were identified. The predictive performance was evaluated for capacity to identify authors working in the same research domains. Biomedical annotation analysis of primary literature provides insight into the areas of research focus, and is demonstrated to link entities through similarities in their MeSHOPs. We quantitatively confirm the trend where well-studied genes, diseases and drugs are more likely to be the focus of further research. MeSHOP analysis demonstrates that knowledge in the annotated primary literature can be efficiently mined, and the untapped knowledge therein can be discovered computationally. 

View record

Evolutionary conserved regulatory programs (2011)

No abstract available.

Computational Prediction of Regulatory Element Combinations and Transcription Factor Cooperativity (2010)

No abstract available.

Master's Student Supervision (2010 - 2018)
Bioinformatics design of cis-regulatory elements controlling human gene expression (2017)

Gene therapy has the potential to not only treat, but cure individuals suffering from inherited diseases. Advances in understanding the human genome and the discovery of causal genes underlying diseases has heightened the need to solve the gene therapy challenge. Viral vectors are often used as a delivery tool for therapeutics, but their safety and efficacy are still being studied. To contribute to this goal, we have created 49 small viral promoters by bioinformatically annotating cis-regulatory regions from which a subset are concatenated with the goal of drivingcell-specific expression of a reporter gene. We have tested a subset of these in mice in vivo. Regulatory region analysis can take a trained designer multiple weeks. To resolve this issue, we have created a semi-automated approach to regulatory region identification, named OnTarget. The OnTarget database accumulates thousands of cell and tissue-specific experiments in order to identify regions informative of regulatory properties. OnTarget is able to identify regulatory regions consistent with those identified by designers. In this capacity, we expect OnTarget to lead to betterand faster identification of cis-regulatory regions for the design of promoters targeting specific sets of cells.

View record

Text based methods for variant prioritization (2017)

Despite improvements in sequencing technologies, DNA sequence variant interpretation for rare genetic diseases remains challenging. In a typical workflow for the Treatable Intellectual Disability Endeavor in B.C. (TIDE BC), a geneticist examines variant calls to establish a set of candidate variants that explain a patient's phenotype. Even with a sophisticated computation pipeline for variant prioritization, they may need to consider hundreds of variants. This typically involves literature searches on individual variants to determine how well they explain the reported phenotype, which is a time consuming process. In this work, text analysis based variant prioritization methods are developed and assessed for the capacity to distinguish causal variants within exome analysis results for a reference set of individuals with metabolic disorders.

View record

 
 

If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.