Sara Mostafavi

Assistant Professor

Relevant Degree Programs


Graduate Student Supervision

Doctoral Student Supervision (Jan 2008 - Nov 2019)
DNA methylation microarray data reduction for co-methylation analysis (2020)

DNA Methylation (DNAm) is an epigenetic modification that is present across the human genome, primarily in the context of CpG di-nucleotides. In human population studies, high throughput bead chip microarray assays are the prevalent way to simultaneously measure the methylation state of many thousands of genomic CpG sites. Proximal genomic CpGs have correlated methylation state within a single cell and often function as a single biological unit. The prevailing common methylation state of such multiple CpGs within a common biological unit has been the subject of intense study, due to its immediate relevance for gene expression regulation and ultimately for health and disease. I designed and implemented a method for a biologically motivated DNAm array data reduction, which constructs co-methylated regions (CMRs), while incorporating information about the genomic CpG background from the reference human genome annotation. The method relies on the correlations of CpG methylation across individuals for proximal CpG probes. The method aims for enhanced statistical performance in terms of statistical power and specificity, including for downstream applications. For example, Epigenome Wide Association Studies (EWAS), an important such application, often places the focus on group “hits” with multiple adjacent CpGs that are significant, because their gnomic proximity makes it more likely that the detected correlations are not spurious. The CMRs capture such groups and I showed that the CMRs constructed in whole blood public data have high statistical specificity in the context of EWAS for chronological age and biological sex. When the composite CMR methylation measures were used to perform EWAS for age and sex, they had high sensitivity and specificity, including uncovering additional associated CpGs not detected by conventional EWAS. The utility of the data reduction method was further discussed within the broader context of applying machine learning algorithms for high dimensional DNAm array data analysis.

View record

Master's Student Supervision (2010 - 2018)
Molecular interpretation of genome-wide association studies using multiomics analysis (2018)

Genome-wide association studies have found thousands of single-nucleotide polymorphismsassociated with various human traits. Recently, a powerful statisticalapproach called MetaXcan has been proposed for interpreting genome-wide associationsat the gene level. We extended MetaXcan to a multiomics application,using a brain cortex reference dataset that includes gene expression, DNA methylation,and histone acetylation data from approximately 400 individuals. Our approach,Multi-MetaXcan, consists of three steps. In the first step, we use regularizedregression to build models that predict gene expression and variation in epigenomicmodifications from single-nucleotide polymorphisms. We call these modelsgenotype-based imputation models. In the second step, we apply these models tomap genome-wide associations to gene-level and epigenomic-level associations.Finally, in the third step, our model summarizes all molecular-level associations atthe gene level by building epigenome-based imputation models that predict geneexpression levels from nearby epigenomic marks like CpG sites and transcriptionallyactive regions. In summary, Multi-MetaXcan identifies trait-associatedgenes whose expression levels are impacted by single-nucleotide polymorphismsand their influence on intermediate molecular traits such as DNA methylation andhistone acetylation. We applied Multi-MetaXcan to a major depressive disordergenome-wide association study. As the result, we discovered 12 genes, 25 transcriptionallyactive regions, and 163 CpG sites associated with major depressivedisorder corresponding to 74 genes in total. 26 of these genes fall within or closeto previously identified major depressive disorder-associated genomic regions. Importantly,the inclusion of epigenomic data resulted in an additional 62 genes thatwere not identified by gene expression imputation model alone.

View record


If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.


Learn about our faculties, research, and more than 300 programs in our 2021 Graduate Viewbook!