Paul Pavlidis
Research Classification
Research Interests
Relevant Thesis-Based Degree Programs
Affiliations to Research Centres, Institutes & Clusters
Research Options
Research Methodology
Recruitment
Please visit https://pavlab.msl.ubc.ca/research/ for more information on our research.
Complete these steps before you reach out to a faculty member!
- Familiarize yourself with program requirements. You want to learn as much as possible from the information available to you before you reach out to a faculty member. Be sure to visit the graduate degree program listing and program-specific websites.
- Check whether the program requires you to seek commitment from a supervisor prior to submitting an application. For some programs this is an essential step while others match successful applicants with faculty members within the first year of study. This is either indicated in the program profile under "Admission Information & Requirements" - "Prepare Application" - "Supervision" or on the program website.
- Identify specific faculty members who are conducting research in your specific area of interest.
- Establish that your research interests align with the faculty member’s research interests.
- Read up on the faculty members in the program and the research being conducted in the department.
- Familiarize yourself with their work, read their recent publications and past theses/dissertations that they supervised. Be certain that their research is indeed what you are hoping to study.
- Compose an error-free and grammatically correct email addressed to your specifically targeted faculty member, and remember to use their correct titles.
- Do not send non-specific, mass emails to everyone in the department hoping for a match.
- Address the faculty members by name. Your contact should be genuine rather than generic.
- Include a brief outline of your academic background, why you are interested in working with the faculty member, and what experience you could bring to the department. The supervision enquiry form guides you with targeted questions. Ensure to craft compelling answers to these questions.
- Highlight your achievements and why you are a top student. Faculty members receive dozens of requests from prospective students and you may have less than 30 seconds to pique someone’s interest.
- Demonstrate that you are familiar with their research:
- Convey the specific ways you are a good fit for the program.
- Convey the specific ways the program/lab/faculty member is a good fit for the research you are interested in/already conducting.
- Be enthusiastic, but don’t overdo it.
G+PS regularly provides virtual sessions that focus on admission requirements and procedures and tips how to improve your application.
ADVICE AND INSIGHTS FROM UBC FACULTY ON REACHING OUT TO SUPERVISORS
These videos contain some general advice from faculty across UBC on finding and reaching out to a potential thesis supervisor.
Supervision Enquiry
Graduate Student Supervision
Doctoral Student Supervision
Dissertations completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest dissertations.
Differential expression (DE) analysis is performed to identify genes associated to a phenotype based on changes in RNA expression levels. The result of various bioinformatics analyses is a hit list of genes that requires further interpretation to identify the functions of these genes and prioritize the genes for further study; there is currently a lack of objective metrics for gene prioritization. The ease of generating transcriptomic data has resulted in the accumulation of massive amounts of data in repositories (“NCBI GEO”). In my thesis, I investigate means of harnessing this archived data for interpreting hit lists. First, I describe the development of Gemma, a large corpus containing over 10,000 curated and reprocessed datasets made suitable for data mining. I contributed by establishing the curation guidelines of using ontology concepts during dataset annotation, and characterizing Gemma’s features. Next, I describe the evaluation of Connectivity Map (CMap), a hit list interpretation framework designed for in silico repositioning of previously approved drugs for treating human diseases. Through a series of analyses, I demonstrated that drug repositioning results between two versions of CMap are discordant, and is caused by low reproducibility of DE profiles both between and within each CMap. This demonstrates the importance of high-quality data and careful evaluation of hit list interpretation frameworks. Finally, in a collaboration, we showed that there are huge differences in how often genes are differentially expressed (“DE prior”) across a large corpus of human datasets. We proposed that the prior could be used to facilitate hit list interpretation, identifying genes that are more specifically DE in a studied phenotype. I expanded this work further by examining variables that may influence the DE prior such as microarray platform gene coverage; I found the DE prior robust to these variables. I also demonstrate that given enough data, context (e.g. tissue) or topic specific DE priors can be developed for topic-specific applications. My work contributes to our knowledge of patterns of gene differential expression and their utility in addressing questions related to gene function in human health and disease.
View record
Most mammalian genes generate multiple transcripts via splicing, and we do not know the function of most splice variants. Currently, there is a debate about how many splice variants are likely nonfunctional or “noisy” transcripts. My thesis explores the claim that alternative splicing vastly increases the genome’s functional diversity in the context of noisy splicing, and in doing so attempts to identify candidate cases for which alternative splicing is likely to be of consequence.To ground computational analyses of genes with multiple splice variants in experimental data, the field needs a corpus of genes that have experimental evidence of functionally distinct splice isoforms (FDSIs). We curated the literature for 743 genes and found that ~5% had literature evidence of FDSIs. This suggests that the claim that alternative splicing vastly increases genomic functional diversity is extrapolated from a few key genes.Next, I developed a pipeline to identify candidate genes with FDSIs using long-read RNA-seq data. The output of my pipeline is a computationally-prioritized list of candidate genes likely to have FDSIs based on features such as expression, conservation, functional domains, and coding-potential. From an initial set of 6,799 genes with multiple splice variants, I prioritized 79 candidate genes. While I had limited long-read data, my work aids in establishing guidelines for high-throughput prioritization of genes with FDSIs for future study.With our collaborators, I investigated a specific application of my pipeline to the voltage-gated calcium channel gene Cacna1e. Using novel long-read data, I established a set of 2,110 splice variants for Cacna1e. Based on properties of the channel, I determined that at most 154 splice variants are likely to encode a functional channel. My results highlighted the amount of potential noise produced by one gene’s expression. Through my investigation, I added to the growing body of literature in support of noisy splicing. I also provided the field with a list of interesting genes with multiple splice variants. This includes a gold standard set of genes from the experimental literature, and a novel set of prioritized genes. Both sets of genes will be useful for future studies of gene function.
View record
One of the key features of transcriptomic data is the similarity of expression patterns among groups of genes, referred to as coexpression. It has been shown that coexpressed genes tend to share similar functions. Based on this, a common assumption is that gene coexpression is a result of transcriptional regulation and therefore, regulatory relationships could be inferred from coexpression. However, success in inferring such relationships has been limited and there are questions about the source and interpretation of coexpression. Here I explore coexpression as an observed signal from the data, examine its source and assess its relevance for inferring regulatory relationships. In chapter 2 I studied differential coexpression, which refers to the alteration of gene coexpression between biological conditions. It is commonly assumed that differential coexpression can reveal rewiring of transcription regulatory networks, specifically among the genes that maintain their average expression level between the conditions. However, I show that to a large extent and in contrast to this common assumption, differential coexpression is more parsimoniously explained by changes in average expression levels. This finding demonstrates limitations for inference of regulatory rewiring from coexpression and poses questions for the underlying causes of the observed coexpression. In Chapter 3, I studied cellular composition variation among bulk tissue samples as a source of variance and the observed coexpression. I found that for most genes, differences in expression levels across cell types account for a large fraction of their variance and as a result genes with similar cell-type expression profiles appear to be coexpressed. Finally, I showed that this coexpression dominates the underlying intra-cell-type coexpression and also has the two prominent features of coexpression in the bulk tissue: reproducibility and biological relevance. Through my studies, I was able to provide an explanation for much of the observed coexpression in the bulk tissue and shed light on its resolution and limitation for inference of regulatory relationships. I also studied coexpression in single-nucleus data and show that some of the observed coexpression in it is likely to be attributed to the transcriptional regulation, which could be a subject for future studies.
View record
Ribonucleic acids (RNA), are an essential part of cellular function, transcribed from DNA and translated into protein. Rather than a passive informational medium, RNA can also be highly functional and regulatory. Certain RNAs fold into specific structures giving it enzymatic properties, while others bind to specific targets to guide regulatory processes. With the advent of next-generation sequencing, a large number of novel non-coding RNAs have been discovered through whole-transcriptome sequencing. Many efforts have been made to study the structure and binding partners of these novel RNAs, in order to determine their function and roles. This work begins with a description of my R package R4RNA for manipulating RNA basepair data, the building blocks of RNA structure and RNA binding. The package deals with the input/output and manipulation of RNA basepair and sequence data, along with statistical and visualization methods for evaluation, interpretation and presentation. We also describe R-chie, a visualization tool and web server built on R4RNA that visualizes complex RNA basepairs in conjunction with sequence alignments. We then conduct the largest known evaluation of RNA-RNA interaction methods to date, running state-of-the-art tools on curated experimentally validated datasets. We end with a review of cotranscriptional RNA basepair formation, summarizing biological, theoretical and computational methods for the process, and future directions for improving classical methods in RNA structure prediction.All content chapters of this thesis has been peer-reviewed and published. The work on R4RNA has led to two publications, with the package used to great visual effect by various publications and also adopted by the RNA structure database Rfam. My assessment of RNA-RNA interaction is at present the only published evaluation of its kind, and will hopefully become a benchmark for future tool development and a guide to selecting appropriate tools and algorithms. Our published review on RNA cotranscriptional folding is well-received, being the first review specifically on its topic.
View record
Primarily controlled by gene expression and fine-tuned by translation and degradation rates, protein activity is governed by a plethora of post-translation modifications such as phosphorylation and glycosylation, which generate a diversity of protein species and thereby control complex biological phenotypes. Protease processing by proteases is a particular modification leading to the irreversible generation of stable protein truncations. Well understood in examples such as signal- or propeptide removal, recent analyses consistently identify >50% of N-terminal peptides mapping inside the protein sequence as predicted by genomics, indicating an important regulatory role of proteases. All proteins undergo protease cleavage as part of processing or degradation, a second biological process controlled by proteases. Proteases are involved in numerous pathologies and commonly considered as drug targets. However, protease research and drug development is complicated, in part due to widespread crosstalk between proteases. Proteases regulate other proteases through direct cleavage or cleavage of protease inhibitors in a complex network of protease interactions, the protease web. Yet, a comprehensive analysis of the protease web has not been performed, hampering assignment of proteases to clear biological roles, their direct substrates, and protease inhibitor drug targeting. A second problem in the identification of protein processing is the potential confound between protein termini generated by protease processing, alternative splicing, and alternative translation. In this thesis, I computationally analyzed large and diverse datasets of protease interactions and protein truncations to gain insight into complex proteolytic processes and to guide biochemical follow- up experiments. Analyzing protease cleavage, alternative splicing and alternative translation data incorporated into our database TopFIND, I found that protease cleavage and alternative translation likely generate most protein truncations. Combining protease cleavage and inhibition data in a graph model of the protease web, I demonstrated extensive protease crosstalk and then predicted and validated a proteolytic pathway. Finally, investigating strategies for the prediction of protease inhibition, I predicted hundreds of protease-inhibitor interactions, and validated inhibition of kallikrein-5 by serpin B12. This work thus generated predictions for biochemical follow-up as well as important insights into the regulation of biological systems through proteases.
View record
Neuroscience research is increasingly dependent on bringing together large amounts of data collected at the molecular, anatomical, functional and behavioural levels. This data is disseminated in scientific articles and large online databases. I utilized these large resources to study the wiring diagram of the brain or ‘connectome’. The aims of this thesis were to automatically collect large amounts of connectivity knowledge and to characterize relationships between connectivity and gene expression in the rodent brain. To extract the knowledge embedded in the neuroscience literature I created the first corpus of neuroscience abstracts annotated for brain regions and their connections. These connections describe long distance or macroconnectivity between brain regions. The collection of over 1,300 abstracts allowed accurate training of machine learning classifiers that mark brain region mentions (76% recall at 81% precision) and neuroanatomical connections between regions (50% sentence level recall at 70% precision). By automatically extracting connectivity statements from the Journal of Comparative Neurology I generated a literature based connectome of over 28,000 connections. Evaluations revealed that a large number of brain region descriptions are not found in existing lexicons. To address this challenge I developed novel methods that allow mapping of brain region terms to enclosing structures. To further study the connectome I moved from scientific articles to large online databases. By employing resources for gene expression and connectivity I showed that patterns of gene expression correlate with connectivity. First, two spatially anti-correlated patterns of mouse brain gene expression were identified. These signatures are associated with differences in expression of neuronal and oligodendrocyte markers, suggesting they reflect regional differences in cellular populations. Expression level of these genes is correlated with connectivity degree, with regions expressing the neuron-enriched pattern having more incoming and outgoing connections with other regions. Finally, relationships between profiles of gene expression and connectivity were tested. Specifically, I showed that brain regions with similar expression profiles tend to have similar connectivity profiles. Further, optimized sets of connectivity linked genes are associated with neuronal development, axon guidance and autistic spectrum disorder. This demonstration of text mining and large scale analysis provides new foundations for neuroinformatics.
View record
Schizophrenia is a severe psychiatric illness for which the precise etiology remains unknown. Studies using postmortem human brain have become increasingly important in schizophrenia research, providing an opportunity to directly investigate the diseased brain tissue. Gene expression profiling technologies have been used by a number of groups to explore the postmortem human brain and seek genes which show changes in expression correlated with schizophrenia. While this has been a valuable means of generating hypotheses, there is a general lack of consensus in the findings across studies. Expression profiling of postmortem human brain tissue is difficult due to the effect of various factors that can confound the data. The first aim of this thesis was to use control postmortem human cortex for identification of expression changes associated with several factors, specifically: age, sex, brain pH and postmortem interval. I conducted a meta-analysis across the control arm of eleven microarray datasets (representing over 400 subjects), and identified a signature of genes associated with each factor. These genes provide critical information towards the identification of problematic genes when investigating postmortem human brain in schizophrenia and other neuropsychiatric illnesses. The second aim of this thesis was to evaluate gene expression patterns in the prefrontal cortex associated with schizophrenia by exploring two methods of analysis: differential expression and coexpression. Seven schizophrenia microarray studies of prefrontal cortex were combined for a total of 153 subjects with schizophrenia and 153 healthy controls. Meta-analysis was conducted with careful consideration for the effects of covariates, revealing a robust list of 98 differentially expressed ‘schizophrenia genes’. Using the same seven schizophrenia datasets, coexpression networks were generated for control and schizophrenia cohorts within each dataset and then combined across studies using a rank aggregation approach. Topological properties of our ‘schizophrenia genes’ were evaluated in the context of each network, highlighting differences in correlation structure of these genes in the control and schizophrenia brain. Together these results converge towards a general conclusion, emphasizing that the integration of postmortem human brain expression profiling data improves statistical power and is particularly useful in detecting subtle yet consistent changes in expression associated with schizophrenia
View record
Master's Student Supervision
Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.
The physical substrate of memory in the brain is still a subject of debate. Itis thought that strongly connected networks of neurons, called engrams, canreproduce the original activity pattern from partial cues. Synapses betweenengram neurons are thus the most likely candidate for memory storage;but questions remain over their long-term stability. Some groups have proposedactivity related transcription, while normally considered transient,could be the beginning of a transition to a more permanent cell type thatstores long term memory. Persistent chromatin conformation changes andresulting transcription changes, triggered by reactivation, could be a stablelong-lasting storage mechanism which enables remembering at remote timepoints. Some groups have reported transcription predicted by this model.We trained classifiers using single-cell RNA (scRNA-seq) sequencing fromthe earlier transient signature (Jaeger et al., 2018; Lacar et al., 2016) andthe long term engram neuron signature (Chen et al., 2020). The transientearly signature was readily identifiable but when trying to identify engramcells exhibiting the long-term memory signature we found a significant decreasein classifier performance. The important features of the classifier werenot genes reported as deferentially expressed in the original publication. Reproducing the original author’s results using their data proved challenging,suggesting the persistent long-term memory signature was not detected ordoes not exist. Reactivation did not appear to elicit a strong transcriptionalresponse either, which contradicts models of transcriptomic engramcell formation. Unfortunately, the design of the original experiment does notallow for the falsification of the supposed persistent transcriptional programinduced by reactivation. My research suggests future directions to take inevaluating transcription’s contribution to synaptic plasticity and memory.
View record
Establishing the molecular diversity of cell types is crucial for the study of the nervous system. I compiled a cross-laboratory database of mouse brain cell type-specific transcriptomes from 36 major cell types from across the mammalian brain using rigorously curated published data from pooled cell type microarray and single-cell RNA-sequencing (RNA-seq) studies. I used these data to identify cell type-specific marker genes, discovering a substantial number of novel markers, many of which we validated using computational and experimental approaches. By examining datasets with known cell type proportion differences, I further demonstrate that summarized expression of marker gene sets (MGSs) in bulk tissue data can be used to estimate the relative cell type abundance across samples. Using this approach, I show that majority of genes previously reported as differentially expressed in Parkinson’s disease can be attributed to the reduction in dopaminergic cell number rather than regulatory events. To facilitate use of this expanding resource, I provide a user-friendly web interface at www.neuroexpresso.org.
View record
Coexpression analysis has been widely used for gene function prediction, based on the principleof guilt by association. Most studies use transcriptomic data obtained from bulk tissues, wherethe expression level of genes reflects the contribution of multiple cell types. Previous work hasalready documented how variability of cellular composition impacts coexpression analysis.However, the connection between the predictability of gene functions, coexpression networksand cell type profiles has not been studied. I hypothesized that one reason bulk-data-derivedcoexpression networks contain signals relevant to function prediction is that it containsinformation about genes’ expression profiles across cell types. Focusing on human braindatasets, I applied several approaches to test this hypothesis, including creating simulated bulkdatasets from single-nucleus data and bulk data deconvolution. I find that much predictive powercan be attributed to cell type proportion variation. Consequently, a more explicit andinterpretable function prediction can be made directly using expression patterns across cell types,which not only yields similar results but also clearly reveals the association between thefunctional terms and specific cell types. These findings have important implications forcoexpression analysis and function prediction.
View record
A persistent challenge in genetics and genomics is the interpretation of “hit lists” of genes, leading to the development of, and almost universal application of methods such as Gene Ontology (GO) enrichment analysis. While these methods have been of unquestionable utility, GO enrichment and similar approaches based on gene annotations leave much to be desired and they are often used as a “sanity check” rather than a way to make discoveries. To offer a complementary perspective with the potential to remedy some existing challenges, I developed and evaluated an algorithm that helps put hit lists of genes into biological context by performing large-scale mining on patterns of differential expression (DE). In this work, I present the development and evaluation of my algorithm which mines over 10,000 transcriptomic datasets in a process we term “condition enrichment”. The output of the algorithm is a list of biological condition comparisons (drug treatments, diseases, etc.) scored according to their relatedness (in terms of DE) to the query genes. I show that performing searches on gene sets of a priori interest enables my algorithm to effectively identify known gene-condition relationships in real and simulated data, providing a useful summary of the condition comparisons most highly associated with the differential expression of the gene set. Finally, I present a powerful open-source web application to provide researchers access to Gemma DE, in the hope that it will aid future research.
View record
Quantitative analysis of large single-cell measures acquired by phospho flow cytometry typically involves establishing inclusion gate thresholds and combining measures from accepted cells into a single median metric. Though this analysis method is simple, it overlooks the heterogeneity of cell populations and there could be information missing from the single-cell level. Here, we have formulated approaches that can recognize the heterogeneity and extract additional information involving dose-response and interactions between multiple molecules from phospho flow cytometry datasets. Using phospho flow multiplexed sampling of cell physical features, and primary antibodies against protein markers, including GAPDH as a protein expression control, HA tag as an exogenous gene/variant transfection measurement, and 8 antibodies detecting the activation (phosphorylation) states of 8 proteins within conserved molecular pathways, two panels of phospho-specific antibodies were used simultaneously for multiplexed measures in the same cells. Our approach involves single-cell standardization, fitting loess regression, identifying linear domains in dose-response plots, building linear mixed-effects models, and multi-dimensional analyses to detect interactions between phosphorylated protein markers. We demonstrate the utility of this approach by expressing wild-type and 5 variants (4A, D268E, Y138L, P38H, G129E) of PTEN on 8 markers of molecular pathways downstream of PTEN, and we also expressed RHEB WT testing its impact on markers in the shared associated pathways. We succeeded in differentiating subtypes of PTEN loss-of-function variants and were able to predict that PTEN P38H is a loss-of-lipid-phosphatase-function variant. We were also able to infer that pAKT, p4EBP1, pS6, and pCREB are all downstream targets of PTEN regulation while pAKT is between PTEN and p4EBP1, pS6, or pCREB. In conclusion, our results demonstrate dose response and molecular pathway interactions unavailable from reducing population data to single values, and our approach manifests strong promise in variant function measurement and molecular signaling pathway inference.
View record
Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by impairments in social interaction and communication, and restrictive repetitive behaviours or interests, with extreme phenotypic and genetic heterogeneity. Currently, genetic association studies have identified 90 risk genes with high confidence out of an estimated 1000. Researchers have begun to use machine learning methods leveraging heterogeneous biological network data in attempts to aid in discovery of ASD risk genes. However, the real-world utility of these studies is questionable: network-based machine learners are often biased towards well studied genes because they operate on a principle called “guilty by association.” In this thesis, I evaluate and compare genetic and computation approaches to ASD risk gene prioritization. I demonstrate that network-based computational approaches are adding little additional useful information compared to genetic approaches for prioritization. Furthermore, I demonstrate that gene expression profiles, and generic measures of disease gene likelihood may provide less biased contextual information that can be used to supplement genetic association data to prioritize ASD risk genes. Lastly, I discuss how data quality and data dependence impacts evaluation of machine learning algorithms and genetic association studies.
View record
An accurate phylogeny of a cancer tumour has the potential to shed light on numerous phenomena, such as key oncogenetic events, relationships between clones, and evolutionary responses to treatment. Most work in cancer phylogenetics to-date relies on bulk tissue data, which can resolve only a few genotypes unambiguously. Meanwhile, single-cell technologies have considerably improved our ability to resolve intra-tumour heterogeneity. Furthermore, most cancer phylogenetic methods use classical approaches, such as Neighbor-Joining, which put all extant species on the leaves of the phylogenetic tree. But in cancer, ancestral genotypes may be present in extant populations. There is a need for scalable methods that can capture this phenomenon.We have made progress on this front by developing the Genotype Tree representation of cancer phylogenies, implementing three methods for reconstructing Genotype Trees from binary single-nucleotide variant profiles, and evaluating these methods under a variety of conditions. Additionally, we have developed a tool that simulates the evolution of cancer cell populations, allowing us to systematically vary evolutionary conditions and observe the effects on tree properties and reconstruction accuracy.Of the methods we tested, Recursive Grouping and Chow-Liu Grouping appear to be well-suited to the task of learning phylogenies over hundreds to thousands of cancer genotypes. Of the two, Recursive Grouping has the strongest and most stable overall performance, while Chow-Liu Grouping has a superior asymptotic runtime that is competitive with Neighbor-Joining.
View record
Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder affecting roughly 1% of the human population. Genomics research to date has discovered only a fraction of the variants causative for ASD. To this end, we whole-genome sequenced a cohort of 119 ASD individuals in order to find likely pathogenic variation. After quality and frequency filters, we prioritized variants as likely causal according to rarity and predicted damage scores (CADD and Snap2). Here, we report five de novo damaging variants and seven likely damaging variants of unknown inheritance. Since much of the variation reported in ASD cases is uncertain both in function and in significance in ASD, we aimed to functionally characterize missense variants from the ASD literature in PTEN and SYNGAP1, two well-characterized ASD genes. We curated missense variants of unknown significance from the ASD literature and assayed their functional effect in yeast using a Synthetic Genetic Array. We chose previously biochemically validated variants, population variants, and other variants in the genes of interest to gain insight into the functional diversity of PTEN and SYNGAP1 variation. We established functional effect of the ASD variants of unknown significance in PTEN and showed that computational predictors of damage are reasonable predictors of variants’ functional effects in yeast. We found that agreement of computational metrics breaks down when predicting damage in certain genes, such as SYNGAP1. Functionalizing variants in this way contributes to our understanding of the range of functional effects of ASD variants.
View record
Expression patterns across tissues are a primary indicator of gene function. High-throughput technology created many cross-tissue data sets on a transcriptomic level (tissue panel data sets). However, the existence of multiple tissue panel data sets creates a challenge for the scientific community to decide if these data sets are equally valid or decide which data set to choose. To date, the multiple tissue panel data sets have not been well compared, nor fully evaluated. In my Master’s thesis, I collected a large number of public-available tissue panel data sets, harmonized them, integrated the data sets into a tissue expression atlas including human data and mouse data, compared and contrasted the data sets across the atlas, evaluated each data set preliminarily with a gene-specific disagreement index that I developed. I found in general, these data sets had a good agreement. However, in certain data sets the amount of disagreement was high, which indicated the qualities of these data sets were suspect.Applying the disagreement index, I was able to offer a summarized expression pattern in the tissue expression atlas with either consensus or disagreements outlined. I also developed a web-based prototype to access to this atlas.Furthermore, I explored the range of changes in gene expression patterns that may be caused by experimental conditions, such as diseases or drug treatments. I found most of the changes could not be as dramatic as a change from unexpressed to highly expressed, even though these changes were reported as statistically significant in literatures. Only a couple of conditions such as cancer or inflammation could cause an unexpressed-to-highly-expressed change, because tissue composition in those conditions were changed substantially.
View record
Recently, there has been a major effort by neuroscientists to systematically organize and integrate vast quantities of brain data. However, electrophysiological properties have been shown to be sensitive to experimental conditions, thus directly comparing them between experiments could lead to inconsistent results. Here, I characterize the general effects of experimental solution composition differences on the reported ephys measurements. For that purpose, I employ text-mining, supplemented with manual curation to gather experimental solution information from published neurophysiological articles. I integrate the extracted information into the existing NeuroElectro database, which contains the electrophysiology, neuron type and experimental conditions information (temperature, electrode type, animal age, etc.) from the above neuroscientific literature. Exploring commonly used experimental solution recipes, I found the effect of solution compositions of explaining variance in electrophysiological properties to be small, relative to the amount of the existing ephys variability. Then, I created models for predicting the variability of ephys properties commonly reported by neurophysiologists, using the available experimental conditions information. These models can be used to remove a portion of the ephys variance when comparing results from different experiments, generally making such comparisons more reliable. To validate their performance, I adjusted a portion of NeuroElectro data to experimental conditions used by Allen Institute for Brain Science and compared the respective ephys properties before and after the adjustment.
View record
There is intense interest in understanding the molecular mechanisms that contribute to neurodegenerative disorders (NDs), which involve complex interplays of genetic and environmental factors. To catch early events involved in disease initiation requires investigation on pre-symptomatic brain samples. It is difficult to capture early molecular events using post-mortem human brain samples since these samples represent the late phase of the disorder with progressive brain damage and neurodegeneration. Disease mouse models are developed to study disease progression and pathophysiology. Here, I focus on two of the most studied NDs: Alzheimer’s disease (AD) and Huntington’s disease (HD). Mouse models developed for the disease (AD or HD) often share similar phenotypes mimicking human disease symptoms, which suggest potential common underlying mechanisms of disease initiation and progression across mouse models of the same disease. Investigation of gene expression profiles of pre-symptomatic animals from different mouse models may shed light on the mechanisms occurred in the early disease phase. Gene expression profiling analyses have been performed on mouse models and some of the studies investigate the molecular changes in pre-symptomatic phase of AD and HD respectively. However, their findings have not reached a clear consensus. To identify shared molecular changes across mouse models, I conducted a systematic meta-analysis of gene expression in mouse models of AD and HD, consisted of 369 gene expression profiles from 23 independent studies. The goal of this project is to identify transcriptional alterations shared among different mouse models of each disease respectively, especially changes during early disease phase that may link to disease-causing mechanisms, and potential common cross-disease changes. For both of the disorders, the results showed subtle but biologically interpretable changes shared across mouse models in the early disease phase that may contribute to the early disease progression: dysregulation of genes involved in cholesterol biosynthesis and complement system in AD mouse models and genes encoding mitochondrial respiratory chain complexes in HD mouse models. Cross-disease similarities in the late phase suggested that different brain regions may share mechanisms in response to neuronal loss and toxic protein aggregates.
View record
An accurate phylogeny of a cancer tumour has the potential to shed light on numerous phenomena, such as key oncogenetic events, relationships between clones, and evolutionary responses to treatment. Most work in cancer phylogenetics to-date relies on bulk tissue data, which can resolve only a few genotypes unambiguously. Meanwhile, single-cell technologies have considerably improved our ability to resolve intra-tumour heterogeneity. Furthermore, most cancer phylogenetic methods use classical approaches, such as Neighbor-Joining, which put all extant species on the leaves of the phylogenetic tree. But in cancer, ancestral genotypes may be present in extant populations. There is a need for scalable methods that can capture this phenomenon.We have made progress on this front by developing the Genotype Tree representation of cancer phylogenies, implementing three methods for reconstructing Genotype Trees from binary single-nucleotide variant profiles, and evaluating these methods under a variety of conditions. Additionally, we have developed a tool that simulates the evolution of cancer cell populations, allowing us to systematically vary evolutionary conditions and observe the effects on tree properties and reconstruction accuracy.Of the methods we tested, Recursive Grouping and Chow-Liu Grouping appear to be well-suited to the task of learning phylogenies over hundreds to thousands of cancer genotypes. Of the two, Recursive Grouping has the strongest and most stable overall performance, while Chow-Liu Grouping has a superior asymptotic runtime that is competitive with Neighbor-Joining.
View record
Proteins are macromolecules responsible for a wide range of activities in the structure and function of cells. Their activities have been described in different contexts as a mean to elucidate their ``function". These descriptions have been captured across biological databases in a standardized format called Gene Ontology Annotations (GOA), to disseminate the knowledge and extrapolate the information to other proteins whose function is still unknown. Furthermore, the annotations are used to analyse and interpret data from high-throughput studies and also as a benchmark for the assessment of protein function prediction algorithms. Constant changes occur in GOA that can potentially impact such usages, but only limited effort has been put into exploring their instability, or to assess the impact that these changes have on reproducibility or interpretation of previous analyses. In the present work, I performed the most comprehensive analysis of the annotation instability for 14 representative model organisms (E.coli, fruit fly, Mouse, etc.). The results showed important instability patterns that were species-specific. As such information would be of use to the community to trace the instability of annotations of their interest, a web-based visualization tool was built to track these changes on a protein, functional term and species specific basis. Additionally, we identified artifacts on the annotation data that can be attributed to curation patterns. We propose such artifacts to be considered for a more accurate assessment of function prediction algorithms. Furthermore, the impact that changes in the annotations have on common settings like gene set enrichment analyses was also explored. In particular, 2,000 datasets were used to assess the robustness of enrichment results over time. On average, the results would display a 60% similarity after only 2 years. However, cases were found were the similarity will drop 80% within the same year, demonstrating the impact that the instability has on such applications. In conclusion, the results of this work will prove useful for those who use the annotations to interpret their studies to assess their reliability on a case-by-case scenario.
View record
DNA methylation is thought to play an important role in the regulation of mammalian gene expression. Part of the evidence for this role is the observation that lack of CpG island methylation in gene promoters is associated with high transcriptional activity. However, CpG island methylation level only accounts for a fraction of the variance in gene expression, and methylation in other domains is hypothesized to play a role (e.g., island shores and shelves). We set out to improve understanding of the human methylome through a meta-analysis approach, using 1737 samples from 30 publicly available studies. An initial screen identified 15224 CpGs that are “ultra-stable” in their state, being always fully methylated or unmethylated across diverse tissues, cell types and developmental stages (974 always methylated; 14250 always unmethylated). A further analysis of ultra-stable CpGs led us to identify a novel class of CpG islands, “ravines”, that exhibit a markedly consistent pattern of low methylation with highly methylated flanking shores and shelves. Our findings were validated using independent and heterogeneous datasets assayed on the same and different technologies. Building on additional existing data types such as gene expression microarrays, DNase hypersensitive sites, and histone modifications, we found that ravines are associated with higher gene expression, compared to typical unmethylated CpG islands. This finding suggests a novel role for methylation in promoters, markedly different from the traditional view that active promoters need to be unmethylated. We propose ravines are a new class of CpG islands, established early in development and maintained through differentiation, that mark universally active genes and provide new evidence that methylation beyond the CpG island could play a role in gene expression.
View record
The first chapter of this thesis explored the dominant gene expression pattern in the adult human brain. We discovered that the largest source of variation can be explained by cell type marker expression. Across brain regions, expression of neuron cell type markers are anti-correlated with the expression of oligodendrocyte cell type markers. Next, we explored gene function convergence and divergence in the adult mouse brain. Our contributions are as follows. First, we provide candidate cell type markers for investigating specific cell type populations. Second, we highlight orthologous genes that show functional divergence between human and mouse brains.In the second chapter, we present our preliminary work on the effects of tissue types and experimental conditions on human microarray studies. First, we measured the expression and differential expression levels of tissue-enriched genes. Next, we identified modules with similar expression levels and differential expression p-values. Our results show that expression levels reflect tissue type variation. In contrast, differential expression levels are more complex, owing to the large diversity of experimental conditions in the data. In summary, our work provides a different perspective on the functional roles of genes in human microarray studies.
View record
Declines in Pacific salmon stocks in recent decades have spurred much research into their physiology and survivorship, but comparatively little into their genomics. Sockeye salmon in particular are experiencing high levels of mortality during their migration upriver, and the numbers of returning sockeye have fluxuated wildly with respect to predictions in recent years. The goal of my project is to gain insight into the basic genomics of Pacific salmon stocks, including the sockeye, through bioinformatic approaches to gene expression profiling. Using microarray technology, I have conducted a large-scale analysis of over 1,000 samples from multiple tissues, stocks, and species of salmon. I identified tissue-specific and housekeeping genes and compared them to orthologs in mouse and human, respectively. I have also classified a number of microarray samples with a support vector machine (SVM) using qPCR data showing the presence of several common pathogens affecting Pacific salmon populations. Using identified housekeeping genes as normalizing factors, I modeled in silico a qPCR assay designed to identify salmon as infected or uninfected with a particular pathogen. With these data I hope to increase basic knowledge of the genomics of the Pacific salmon.
View record
Autism spectrum disorders (ASD) are clinically heterogeneous and biologically complex.State of the art genetics research has unveiled a large number of variants linked to ASD. Butin general it remains unclear, what biological factors lead to changes in the brains of autisticindividuals. We build on the premise that these heterogeneous genetic or genomic aberrationswill converge towards a common impact downstream, which might be reflected in thetranscriptomes of individuals with ASD. Similarly, a considerable number of transcriptomeanalyses have been performed in attempts to address this question, but their findings lack aclear consensus. As a result, each of these individual studies has not led to any significantadvance in understanding the autistic phenotype as a whole. The goal of this research is tocomprehensively re-evaluate these expression profiling studies by conducting a systematicmeta-analysis. Here, we report a meta-analysis of over 1000 microarrays across twelveindependent studies on expression changes in ASD compared to unaffected individuals,in blood and brain. We identified a number of genes that are consistently differentiallyexpressed across studies of the brain, suggestive of effects on mitochondrial function. Inblood, consistent changes were more difficult to identify, despite individual studies tendingto exhibit larger effects than the brain studies. Our results are the strongest evidence to dateof a common transcriptome signature in the brains of individuals with ASD.
View record
The first chapter of this thesis addresses a common problem in genomics experiments: interpreting a resulting "hit list" of interesting genes. We present work on an approach for summarizing and exploring "hit lists" that makes use of the large amount of gene expression data in public repositories such as the Gene Expression Omnibus. We compare the query list with datasets that we have analyzed for differential expression of genes. Studies that have similarities to the given hit list yield additional insights, help contextualize studies, and serve as a basis for future meta-analysis. A conceptually similar problem that we addressed is the classification or clustering of datasets based on patterns of differential expression. Both problems required a method for determining distances between datasets based on rankings of genes. We tested and benchmarked several methods using manually annotated datasets. The method that performed best according to our evaluation process is based on Kendall's Tau top-k distance. We investigated potential sources of confounds, finding that the largest challenge may be posed by the high prevalence of certain gene expression patterns. These highly prevalent patterns tended to dominate search results. Nonetheless, we demonstrated the effectiveness of this approach in a case study. In the second chapter, we investigated the role of microRNAs in the context of major depression and suicide. We profiled microRNA and messenger RNA levels in post-mortem prefrontal cortex and hippocampus brain tissue of depressed suicides, suicides, and controls. In the prefrontal cortex, we found miR-1202 to be down-regulated in suicides versus controls, and LCT (lactase enzyme) was up-regulated in suicides or depressed suicides compared to controls. The former result was independently confirmed using quantitative PCR. While further study is needed, our results have the potential to provide insight into molecular changes in the brains of depressed and suicidal individuals.
View record
Microarray expression data sets vary in size, data quality and other features, but most methods for selecting coexpressed gene pairs use a ‘one size fits all’ approach. There have been many different procedures for selecting coexpressed gene pairs of high functional similarity from an expression dataset. However, it is not clear which procedure performs best as there are few studies reporting comparisons of these approaches. The goal of this thesis is to develop a set of “best practices” in order to select coexpression links of high functional similarity from an expression dataset, along which methods for identifying datasets likely to yield poor information. With these goals, we hope to improve the quality of gene function predictions produced by coexpression analysis.Using 80 human expression datasets we examined the impact of different thresholds, correlation metrics, expression data filtering and transformation procedures on performance in functional prediction. We also investigated the relationship between data quality and other features of expression datasets and their performance in functional prediction. We used the annotations of the Gene Ontology as a primary metric to measure similarity in gene function, and employ additional functional metrics for validation. Our results show that several dataset features have a greater influence on the performance in functional prediction than others. Expression datasets which produce coexpressed gene pairs of poor functional quality can be identified by a similar set of data features. Some procedures used in coexpression analysis have a negligible effect on the quality of functional predictions while others are essential to achieving the best performance in the algorithm. We also find that some procedures interact greatly with features of expression datasets and that these interactions increase the number of high quality coexpressed gene pairs retrieved through coexpression analysis. This thesis uncovers important information on the many intrinsic and extrinsic factors that influence the performance in functional prediction of coexpression analysis. The information summarized here will help guide future studies using coexpression analysis and improve the quality of gene function predictions.
View record
Publications
- Characterizing the targets of transcription regulators by aggregating ChIP-seq and perturbation expression data sets (2022)
- ModelMatcher: A scientist‐centric online platform to facilitate collaborations between stakeholders of rare and undiagnosed disease research (2022)
Human Mutation, - The Canadian Open Neuroscience Platform – An Open Science Framework for the Neuroscience Community (2022)
- Curation of over 10 000 transcriptomic studies to enable data reuse (2021)
Database, - In silico discovery of small molecules for efficient stem cell differentiation into definitive endoderm (2021)
- ModelMatcher: A scientist-centric online platform to facilitate collaborations between stakeholders of rare and undiagnosed disease research (2021)
- A low affinity cis-regulatory BMP response element restricts target gene activation to subsets of Drosophila neurons (2020)
eLife, 9 - Can machine learning aid in identifying disease genes? The case of autism spectrum disorder (2020)
- Multi-parametric analysis of 58 SYNGAP1 variants reveal impacts on GTPase signaling, localization and protein stability (2020)
- Untangling the effects of cellular composition on coexpression analysis (2020)
Genome Research, 30 (6), 849--859 - Evaluation of Connectivity Map shows limited reproducibility in drug repositioning (2019)
- Mega-Analysis of Gene Expression in Mouse Models of Alzheimer’s Disease (2019)
eneuro, , ENEURO.0226--19.2019 - Systematic phenomics analysis of ASD-associated genes reveals shared functions and parallel networks underlying reversible impairments in habituation learning (2019)
- Transcriptomic correlates of electrophysiological and morphological diversity within and across excitatory and inhibitory neuron classes (2019)
PLOS Computational Biology, 15 (6), e1007113 - Transcriptomic correlates of electrophysiological and morphological diversity within and across neuron types (2019)
- VariCarta: a comprehensive database of harmonized genomic variants found in ASD sequencing studies (2019)
- VariCarta: A Comprehensive Database of Harmonized Genomic Variants Found in Autism Spectrum Disorder Sequencing Studies (2019)
Autism Research, - A critical assessment of single-cell transcriptomes sampled following patch-clamp electrophysiology (2018)
- Monitoring changes in the Gene Ontology and their impact on genomic data analysis (2018)
GigaScience, - Systematic evaluation of isoform function in literature reports of alternative splicing (2018)
BMC Genomics, 19 (1) - Transcriptomic Evidence for Alterations in Astrocytes and Parvalbumin Interneurons in Subjects With Bipolar Disorder and Schizophrenia (2018)
Biological Psychiatry, - Cross-Laboratory Analysis of Brain Cell Type Transcriptomes with Applications to Interpretation of Bulk Tissue Data (2017)
eneuro, 4 (6), ENEURO.0212--17.2017 - Protease-Inhibitor Interaction Predictions: Lessons on the Complexity of Protein–Protein Interactions (2017)
Molecular & Cellular Proteomics, 16 (6), 1038--1051 - DNA methylation signature of human fetal alcohol spectrum disorder (2016)
Epigenetics and Chromatin, 9 (1) - EGAD: Ultra-fast functional analysis of gene networks (2016)
- Interactive Exploration, Analysis, and Visualization of Complex Phenome-Genome Datasets with ASPIREdb. (2016)
- Interactive Exploration, Analysis, and Visualization of Complex Phenome–Genome Datasets with ASPIREdb (2016)
Human Mutation, 37 (8), 719-726 - Meta-Analysis of Gene Expression Patterns in Animal Models of Prenatal Alcohol Exposure Suggests Role for Protein Synthesis Inhibition and Chromatin Remodeling. (2016)
Alcoholism: Clinical and Experimental Research, 40 (4), 717-727 - Profiling placental and fetal DNA methylation in human neural tube defects (2016)
Epigenetics and Chromatin, 9 (1) - Secondary neurotransmitter deficiencies in epilepsy caused by voltage-gated sodium channelopathies: A potential treatment target? (2016)
Molecular Genetics and Metabolism, 117 (1), 42-48 - Using predictive specificity to determine when gene set analysis is biologically meaningful (2016)
- Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies (2016)
- Wikidata as a semantic framework for the Gene Wiki initiative. (2016)
Database, 2016 - Expansion of the QARS deficiency phenotype with report of a family with isolated supratentorial brain abnormalities (2015)
Neurogenetics, 16 (2), 145-149 - Meta-analysis of gene expression in Autism spectrum disorder (2015)
Autism Research, 8 (5), 593-608 - Metaanalysis of flawed expression profiling data leading to erroneous Parkinson's biomarker identification (2015)
Proceedings of the National Academy of Sciences of the United States of America, 112 (28) - Prenatal Alcohol Exposure Alters Steady-State and Activated Gene Expression in the Adult Rat Brain (2015)
Alcoholism: Clinical and Experimental Research, 39 (2), 251-261 - Proteome TopFIND 3.0 with TopFINDer and PathFINDer: Database and analysis tools for the association of protein termini to pre- and post-translational events (2015)
Nucleic Acids Research, 43 (D1), D290-D297 - Text mining for neuroanatomy using whitetext with an updated corpus and a new web application (2015)
Frontiers in Neuroinformatics, 9 (May) - The path of no return—Truncated protein N-termini and current ignorance of their genesis (2015)
Proteomics, 15 (14), 2547-2552 - Transcriptome sequencing of the anterior cingulate in bipolar disorder: dysregulation of G protein-coupled receptors. (2015)
American Journal of Psychiatry, 172 (11), 1131-1140 - Bias tradeoffs in the creation and analysis of protein–protein interaction networks (2014)
Journal of Proteomics, - Copy number variants (CNVs) analysis in a deeply phenotyped cohort of individuals with intellectual disability (ID) (2014)
BMC Medical Genetics, 15 (1) - De novo mutations in schizophrenia implicate chromatin remodeling and support a genetic overlap with autism and intellectual disability (2014)
Molecular Psychiatry, 19 (6), 652-658 - Gingival tissue transcriptomes identify distinct periodontitis phenotypes (2014)
Journal of Dental Research, 93 (5), 459-468 - MiR-1202 is a primate-specific and brain-enriched microRNA involved in major depression and antidepressant treatment (2014)
Nature Medicine, 20 (7), 764-768 - Network Analyses Reveal Pervasive Functional Regulation Between Proteases in the Human Protease Web (2014)
PLoS Biology, 12 (5) - Pitfalls in the application of gene-set analysis to genetics studies (2014)
Trends in Genetics, 30 (12), 513-514 - Transcriptomic responses to high water temperature in two species of Pacific salmon (2014)
Evolutionary Applications, 7 (2), 286-300 - Assessing identity, redundancy and confounds in Gene Ontology annotations over time (2013)
Bioinformatics, 29 (4), 476-482 - Characterizing the state of the art in the computational assignment of gene function: Lessons from the first critical assessment of functional annotation (CAFA) (2013)
BMC Bioinformatics, 14 (SUPPL) - Corrigendum to Cortical functional connectivity decodes subconscious, task-irrelevant threat related emotion processing [Neuroimage 61/4 (2012) 1355-1363] (DOI:10.1016/j.neuroimage.2012.03.051) (2013)
NeuroImage, - FTO, obesity and the adolescent brain (2013)
Human Molecular Genetics, 22 (5), 1050-1058 - Genome-wide expression profiling of schizophrenia using a large combined cohort (2013)
Molecular Psychiatry, 18 (2), 215-225 - Meta-analysis of gene coexpression networks in the post-mortem prefrontal cortex of patients with schizophrenia and unaffected controls (2013)
BMC Neuroscience, 14 - MiRNA and miRNA target genes in copy number variations occurring in individuals with intellectual disability (2013)
BMC Genomics, 14 (1) - Molecular differences between chronic and aggressive periodontitis. (2013)
Journal of dental research, 92 (12), 1081-1088 - Neurocarta: Aggregating and sharing disease-gene relations for the neurosciences (2013)
BMC Genomics, 14 (1) - Neuron-enriched gene expression patterns are regionally anti-correlated with oligodendrocyte-enriched patterns in the adult mouse and human brain (2013)
Frontiers in Neuroscience, (7 FEB) - Progress and challenges in the computational prediction of gene function using networks: 2012-2013 update (2013)
F1000Research, 2 - "Guilt by association" is the exception rather than the rule in gene networks (2012)
PLoS Computational Biology, 8 (3) - Application and evaluation of automated methods to extract neuroanatomical connectivity statements from free text (2012)
Bioinformatics, 28 (22), 2963-2970 - Comparison of techniques for correlating survival and gene expression data from wild salmon (2012)
Ecology of Freshwater Fish, 21 (2), 189-199 - Consequences of high temperatures and premature mortality on the transcriptome and blood physiology of wild adult sockeye salmon (Oncorhynchus nerka) (2012)
Ecology and Evolution, 2 (7), 1747-1764 - Cortical functional connectivity decodes subconscious, task-irrelevant threat-related emotion processing (2012)
NeuroImage, 61 (4), 1355-1363 - Decoding unattended fearful faces with whole-brain correlations: An approach to identify condition-dependent large-scale functional connectivity (2012)
PLoS Computational Biology, 8 (3) - Diverse epigenetic strategies interact to control epidermal differentiation (2012)
Nature Cell Biology, 14 (7), 753-763 - Gemma: A resource for the reuse, sharing and meta-analysis of expression profiling data (2012)
Bioinformatics, 28 (17), 2272-2273 - KCTD8 gene and brain growth in adverse intrauterine environment: A genome-wide association study (2012)
Cerebral Cortex, 22 (11), 2634-2642 - Progress and challenges in the computational prediction of gene function using networks (2012)
F1000Research, 1 - Using text mining to link journal articles to neuroanatomical databases (2012)
Journal of Comparative Neurology, 520 (8), 1772-1783 - Genomic signatures predict migration and spawning failure in wild canadian salmon (2011)
Science, 331 (6014), 214-217 - Relationships between gene expression and brain wiring in the adult rodent brain (2011)
PLoS Computational Biology, 7 (1) - The impact of multifunctional genes on guilt "by association "analysis (2011)
PLoS ONE, 6 (2) - The NeuroDevNet Neuroinformatics Core (2011)
Seminars in Pediatric Neurology, 18 (1), 17-20 - The role of indirect connections in gene networks in predicting function (2011)
Bioinformatics, 27 (13), 1860-1866 - Understanding the impact of 1q21.1 copy number variant (2011)
Orphanet Journal of Rare Diseases, 6 (1) - A cross-laboratory comparison of expression profiling data from normal human postmortem brain (2010)
Neuroscience, 167 (2), 384-395 - Bioinformatics techniques in microarray research: applied microarray data analysis using R and SAS software. (2010)
Methods in molecular biology (Clifton, N.J.), 666, 395-417 - Gene function analysis in complex data sets using ErmineJ. (2010)
Nature protocols, 5 (6), 1148-1159 - Outcome of array CGH analysis for 255 subjects with intellectual disability and search for candidate genes using bioinformatics (2010)
Human Genetics, 128 (2), 179-194 - Transcriptional changes in Huntington disease identified using genome-wide expression profiling and cross-platform analysis (2010)
Human Molecular Genetics, 19 (8), 1438-1452 - A methodology for the analysis of differential coexpression across the human lifespan (2009)
BMC Bioinformatics, 10 - Application and evaluation of automated semantic annotation of gene expression experiments (2009)
Bioinformatics, 25 (12), 1543-1549 - Automated recognition of brain region mentions in neuroscience literature (2009)
Frontiers in Neuroinformatics, 3 (SEP) - Gene expression signatures in polyarticular juvenile idiopathic arthritis demonstrate disease heterogeneity and offer a molecular classification of disease subsets (2009)
Arthritis and Rheumatism, 60 (7), 2113-2123 - Granulocyte chemotactic protein 2 (gcp-2/cxcl6) complements interleukin-8 in periodontal disease (2009)
Journal of Periodontal Research, 44 (4), 465-471 - Histone Deacetylase Inhibition Elicits an Evolutionarily Conserved Self-Renewal Program in Embryonic Stem Cells (2009)
Cell Stem Cell, 4 (4), 359-369 - Integration of neuroimaging and microarray datasets through mapping and model-theoretic semantic decomposition of unstructured phenotypes (2009)
Cancer Informatics, 8, 75-94 - Meta-analysis of kindling-induced gene expression changes in the rat hippocampus (2009)
Frontiers in Neuroscience, 3 (SEP) - Subgingival bacterial colonization profiles correlate with gingival tissue gene expression (2009)
BMC Microbiology, 9 - Subtype-specific peripheral blood gene expression profiles in recent-onset juvenile idiopathic arthritis (2009)
Arthritis and Rheumatism, 60 (7), 2102-2112 - Altered brain microRNA biogenesis contributes to phenotypic deficits in a 22q11-deletion mouse model (2008)
Nature Genetics, 40 (6), 751-760 - Gene Ontology term overlap as a measure of gene functional similarity (2008)
BMC Bioinformatics, 9 - Transcriptomes in healthy and diseased gingival tissues (2008)
Journal of Periodontology, 79 (11), 2112-2124 - Activation of MAPK in hearts of EMD null mice: Similarities between mouse models of X-linked and autosomal dominant Emery - Dreifuss muscular dystrophy (2007)
Human Molecular Genetics, 16 (15), 1884-1895 - Activation of MAPK pathways links LMNA mutations to cardiomyopathy in Emery-Dreifuss muscular dystrophy (2007)
Journal of Clinical Investigation, 117 (5), 1282-1293 - Informatics in neuroscience (2007)
Briefings in Bioinformatics, 8 (6), 446-456 - Periodontal therapy alters gene expression of peripheral blood monocytes (2007)
Journal of Clinical Periodontology, 34 (9), 736-747 - Sharing and reusing gene expression profiling data in neuroscience (2007)
Neuroinformatics, 5 (3), 161-175 - β-catenin/TCF/Lef controls a differentiation-associated transcriptional program in renal epithelial progenitors (2007)
Development, 134 (17), 3177-3190 - Akt1 deficiency affects neuronal morphology and predisposes to abnormalities in prefrontal cortex functioning (2006)
Proceedings of the National Academy of Sciences of the United States of America, 103 (45), 16906-16911 - An ancestral haplotype defines susceptibility to doxorubicin nephropathy in the laboratory mouse (2006)
Journal of the American Society of Nephrology, 17 (7), 1796-1800 - Zac1 Regulates an Imprinted Gene Network Critically Involved in the Control of Embryonic Growth (2006)
Developmental Cell, 11 (5), 711-722 - A Mendelian locus on chromosome 16 determines susceptibility to doxorubicin nephropathy in the mouse (2005)
Proceedings of the National Academy of Sciences of the United States of America, 102 (7), 2502-2507 - ErmineJ: Tool for functional analysis of gene expression data sets (2005)
BMC Bioinformatics, 6 - Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms (2005)
Nucleic Acids Research, 33 (18), 5914-5923 - Microarray analysis of gene expression following the formalin test in the infant rat (2005)
Pain, 117 (1-2), 6-18 - Molecular aging in human prefrontal cortex is selective and continuous throughout adult life (2005)
Biological Psychiatry, 57 (5), 549-558 - Transcriptional and behavioral interaction between 22q11.2 orthologs modulates schizophrenia-related phenotypes in mice (2005)
Nature Neuroscience, 8 (11), 1586-1594 - Altered hippocampal transcript profile accompanies an age-related spatial memory deficit in mice (2004)
Learning and Memory, 11 (3), 253-260 - Book Review (2004)
Journal of Biomedical Informatics, 37 (1), 54--55 - Cluster analysis of genes with significant change in expression in cells conditioned to survive TBOOH (2004)
Experimental Eye Research, 78 (2), 301-308 - Coexpresion analysis of human genes across many microarray data sets (2004)
Genome Research, 14 (6), 1085-1094 - Gene expression in juvenile arthritis and spondyloarthropathy: Pro-angiogenic ELR+ chemokine genes relate to course of arthritis (2004)
Rheumatology, 43 (8), 973-979 - Gene Expression Profiling of Depression and Suicide in Human Prefrontal Cortex (2004)
Neuropsychopharmacology, 29 (2), 351-361 - Gene expression signatures in chronic and aggressive periodontitis: A pilot study (2004)
European Journal of Oral Sciences, 112 (3), 216-223 - Support vector machine classification on the web (2004)
Bioinformatics, 20 (4), 586-587 - Using the gene ontology for microarray data mining: A comparison of methods and application to age effects in human prefrontal cortex (2004)
Neurochemical Research, 29 (6), 1213-1222 - Bioinformatic analysis of autism positional candidate genes using biological databases and computational gene network prediction (2003)
Genes, Brain and Behavior, 2 (5), 303-320 - Classification and subtype prediction of adult soft tissue sarcoma by functional genomics (2003)
American Journal of Pathology, 163 (2), 691-700 - Classification of clear-cell sarcoma as a subtype of melanoma by genomic profiling (2003)
Journal of Clinical Oncology, 21 (9), 1775-1781 - Hierarchical model of gene regulation by transforming growth factor β (2003)
Proceedings of the National Academy of Sciences of the United States of America, 100 (18), 10269-10274 - Inducible enhancement of memory storage and synaptic plasticity in transgenic mice expressing an inhibitor of ATF4 (CREB-2) and C/EBP proteins (2003)
Neuron, 39 (4), 655-669 - Matrix2png: A utility for visualizing matrix data (2003)
Bioinformatics, 19 (2), 295-296 - Sex genes for genomic analysis in human brain: internal controls for comparison of probe level data extraction. (2003)
BMC bioinformatics [electronic resource], 4 (1) - The effect of replication on gene expression microarray experiments (2003)
Bioinformatics, 19 (13), 1620-1627 - Using ANOVA for gene selection from microarray studies of the nervous system (2003)
Methods, 31 (4), 282-289 - Cutting edge: STAT6 serves as a positive and negative regulator of gene expression in IL-4-stimulated B lymphocytes (2002)
Journal of Immunology, 168 (3), 996-1000 - Differential amplification of gene expression in lens cell lines conditioned to survive peroxide stress (2002)
Investigative Ophthalmology and Visual Science, 43 (10), 3251-3264 - Exploring gene expression data with class scores. (2002)
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, , 474-485 - Learning gene functional classifications from multiple data types (2002)
Journal of Computational Biology, 9 (2), 401-411 - A high-throughput study of gene expression in preterm labor with a subtractive microarray approach (2001)
American Journal of Obstetrics and Gynecology, 185 (3), 716-724 - Analysis of strain and regional variation in gene expression in mouse brain. (2001)
Genome biology, 2 (10) - Gene functional classification from heterogeneous data (2001)
Proceedings of the Annual International Conference on Computational Molecular Biology, RECOMB, , 249-255 - Pair recordings reveal all-silent synaptic connections and the postsynaptic expression of long-term potentiation (2001)
Neuron, 29 (3), 691-701 - Promoter region-based classification of genes. (2001)
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, , 151-163 - Molecular and functional heterogeneity of hyperpolarization-activated pacemaker channels in the mouse CNS (2000)
Journal of Neuroscience, 20 (14), 5264-5275 - Presynaptic protein kinase activity supports long-term potentiation at synapses between individual hippocampal neurons (2000)
Journal of Neuroscience, 20 (12), 4497-4505 - Synaptic transmission in pair recordings from CA3 pyramidal cells in organotypic culture (1999)
Journal of Neurophysiology, 81 (6), 2787-2797 - Basal and apical synapses of CA1 pyramidal cells employ different LTP induction mechanisms (1996)
Learning and Memory, 3 (4), 289-295 - Seizures and failures in the giant fiber pathway of Drosophila bang-sensitive paralytic mutants (1995)
Journal of Neuroscience, 15 (8), 5810-5819 - The Drosophila easily shocked gene: A mutation in a phospholipid synthetic pathway causes seizure, neuronal failure, and paralysis (1994)
Cell, 79 (1), 23-33
If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.