Steven J Jones

Professor

Relevant Degree Programs

 
 

Recruitment

Master's students
Doctoral students
Postdoctoral Fellows
Any time / year round

Bioinformatics Cancer Genomics

Complete these steps before you reach out to a faculty member!

Check requirements
  • Familiarize yourself with program requirements. You want to learn as much as possible from the information available to you before you reach out to a faculty member. Be sure to visit the graduate degree program listing and program-specific websites.
  • Check whether the program requires you to seek commitment from a supervisor prior to submitting an application. For some programs this is an essential step while others match successful applicants with faculty members within the first year of study. This is either indicated in the program profile under "Admission Information & Requirements" - "Prepare Application" - "Supervision" or on the program website.
Focus your search
  • Identify specific faculty members who are conducting research in your specific area of interest.
  • Establish that your research interests align with the faculty member’s research interests.
    • Read up on the faculty members in the program and the research being conducted in the department.
    • Familiarize yourself with their work, read their recent publications and past theses/dissertations that they supervised. Be certain that their research is indeed what you are hoping to study.
Make a good impression
  • Compose an error-free and grammatically correct email addressed to your specifically targeted faculty member, and remember to use their correct titles.
    • Do not send non-specific, mass emails to everyone in the department hoping for a match.
    • Address the faculty members by name. Your contact should be genuine rather than generic.
  • Include a brief outline of your academic background, why you are interested in working with the faculty member, and what experience you could bring to the department. The supervision enquiry form guides you with targeted questions. Ensure to craft compelling answers to these questions.
  • Highlight your achievements and why you are a top student. Faculty members receive dozens of requests from prospective students and you may have less than 30 seconds to pique someone’s interest.
  • Demonstrate that you are familiar with their research:
    • Convey the specific ways you are a good fit for the program.
    • Convey the specific ways the program/lab/faculty member is a good fit for the research you are interested in/already conducting.
  • Be enthusiastic, but don’t overdo it.
Attend an information session

G+PS regularly provides virtual sessions that focus on admission requirements and procedures and tips how to improve your application.

 

Graduate Student Supervision

Doctoral Student Supervision (Jan 2008 - May 2021)
The clinical actionability and evolution of mutational processes in metastatic cancer (2020)

Cancers are characterized by somatic mutation arising from the interplay of mutagen exposure and deficient DNA repair. Whole genome sequencing of tumours reveals characteristic patterns of mutation, known as mutation signatures, which often correspond with specific processes such as cigarette smoke exposure or the loss of a DNA repair pathway. Quantifying DNA repair deficiency can have clinical implications. Cancer chemotherapies which induce DNA damage are known to be more effective against cancers with deficient DNA repair. However, it is not yet known whether mutation signatures can serve as reliable predictive biomarkers for response to these treatments. Furthermore, the current understanding of mutation signatures stems largely from studies of primary, untreated tumours, whereas metastasis underpins as much as 90% of cancer-related mortality. This thesis aims to (1) describe the association between mutation signatures and clinical response to DNA damaging chemotherapy, (2) enable accurate personalized assessment of mutation signatures and their evolution over time, and (3) characterize the evolution of mutational processes in metastatic cancers. To assess clinical actionability, we quantified signatures of single nucleotide variants, structural variants, copy number variants, and small deletions in 93 metastatic breast cancers, 33 of which received platinum-based chemotherapy. We found that patients with signatures of homologous recombination deficiency had improved responses and prolonged treatment durations on platinum-based chemotherapy. Next, we formulated a Bayesian model called SignIT, which improves the accuracy of individualized mutation signature analysis and infers signature evolution over tumour subpopulations. We demonstrated SignIT’s superior accuracy on both simulated data and somatic mutations from The Cancer Genome Atlas, and validated temporal dissection using whole genomes from 24 multiply-sequenced cancers. We highlighted a potential clinical application of mutation in a BRCA1-mutated pancreatic adenocarcinoma with low Homologous Recombination Deficiency (HRD) signature but exceptional response to platinumcontaining chemotherapy. Finally, we deciphered mutation signatures from nearly 500 metastatic cancer whole genomes, revealing evolution of mutational processes associated with late metastasis and exposure to cytotoxic chemotherapy. Taken together, our findings demonstrate the complex interplay of factors shaping the metastatic cancer genome. We highlight both clinical opportunities of studying genomic instability and the additional insights available from understanding their temporal evolution.

View record

Utility of machine learning approaches for cancer diagnosis and analysis from RNA sequencing (2020)

The highest number of cancer-associated deaths are attributable to metastasis. These include rare cancer types that lack established treatment guidelines, or cancers that become resistant to established lines of therapy. Precision oncology projects aim to develop treatment options for these patients by obtaining a detailed molecular view of the cancer. Scientists use sequencing data like whole-genome sequencing and RNA-sequencing to understand the biology of the cancer. A significant challenge in this process is diagnosing the cancer type of the sample since the observed measurements are best understood with this context. Routine histopathology relies on tissue morphology and can fail to provide a determinative diagnosis when the cancer metastasizes, presents biology attributable to multiple different cancer types, or presents as a rare cancer type. Molecular data has revealed differences in the genetic makeup of cancers that appear morphologically similar, motivating the use of molecular diagnostics. Nevertheless, no existing tools utilize the output from these sequencing modalities in its entirety (that is, without feature selection). There is also limited work evaluating the utility of pan-cancer molecular diagnostics in a precision oncology trial. In this work we review an ongoing precision oncology trial and identify the impact of sequencing-based approaches on cancer diagnosis. We develop SCOPE, a machine-learning method that uses RNA-Seq profiles of tumours for automated cancer diagnosis. We show that this method, which uses over 17,688 gene measurements as input, has better classification accuracy than when using statistically prioritized marker genes, can deconvolve cancer-types with mixed histology, and has high performance in metastatic cancers and cancers of unknown origin. In precision oncology, manual analysis of the tumour's genomic profile is used to understand tumour biology and driver pathways. We find that by assessing the classifier's dependence on gene subsets, we can automatically calculate the importance of various biological programs in individual tumours. Pathways prioritized through this tool - called PIE - show a high overlap with manual integrative analysis performed by expert bioinformaticians to identify clinically important genomic changes. Lastly, we demonstrate that PIE facilitates cohort-wide cancer analysis and discovery of novel sub-groups in advanced cancers.

View record

Building and inferring knowledge bases using biomedical text mining (2019)

Biomedical researchers have the overwhelming task of keeping abreast of the latest research. This is especially true in the field of personalized cancer medicine where knowledge from different areas such as clinical trials, preclinical studies, and basic science research needs to be combined. We propose that automated text mining methods should become a commonplace tool for researchers to help them locate relevant research, assimilate it quickly and collate for hypothesis generation. To move towards this goal, we focus on extracting relations from published abstracts and full-text papers. We first explore the use of co-occurrences in sentences and develop a method for inferring new co-occurrences that can be used for hypothesis generation. We next explore more advanced relation extraction methods by developing a supervised learning method, VERSE, which won part of the BioNLP 2016 Shared Task. Our classical method outperforms a deep learning method showing its applicability to text mining problems with limited training data. We develop it further into the Kindred Python package which integrates with other biomedical text mining resources and is easily applied to other biomedical problems. Finally, we examine the applicability of these methods in personalized cancer research. The specific role of genes in different cancer types as drivers, oncogenes, and tumor suppressors is essential information when interpreting an individual cancer genome. We built CancerMine, a high-quality knowledgebase, using the Kindred classifier and annotations from a team of annotators. This allows for quantifiable comparisons of different cancer types based on the importance of different genes. The clinical relevance of cancer mutations is generally locked in the raw text of literature and was the focus of the CIViCmine project. As a collaboration with the Clinical Interpretation of Variants in Cancer (CIViC) project team, we built methods to prioritise relevant papers for curation. Through this work, we have focussed on different ways to extract structured knowledge from individual sentences in biomedical publications. The methods, guidelines, and results developed will aid biomedical text mining research and the personalized cancer treatment community.

View record

Genomic Analysis of Head and Neck Endocrine Glands (2015)

Discovering biomarkers and molecular drivers of head and neck endocrine tumors was the inspiration for this thesis. Here, I describe the molecular evaluation of tumors of the thyroid and parathyroid endocrine glands for the purpose of identifying somatic driver alterations in these cancers. While molecular interplay of the germline genomic background of an individual and the somatic genome that emerges throughout the lifetime plays significant roles in increasing the susceptibility to cancer and in driving the malignant phenotype, the major known contributors to cancer remain the acquired somatic mutations. Analysis of a sporadic and recurring parathyroid carcinoma, with incidence of 1 per million population, revealed mutations in mTOR, MLL2, CDKN2C and PIK3CA and comparison of patient-matched primary and recurrent malignant tumors uncovered loss of PIK3CA activating mutation during the evolution of the tumor. Loss of the short arm of chromosome 1 along with somatic missense and truncating mutations in CDKN2C and THRAP3 provided new evidence for the potential role of these as tumor suppressors. Hürthle cell thyroid carcinoma accounts for a small proportion of all thyroid cancers; however, this malignancy often presents at an advanced stage and poses unique challenges. Genomic analysis revealed large regions of copy number variation encompassing nearly the entire genomes accompanied also by near haploidization. Moreover, I identified loss-of-function mutations of the tumor suppressor gene MEN1 in 4% of patients. Repeated alterations of the epigenetic machinery in anaplastic thyroid carcinoma, one of the most fatal of all adult solid malignancies, and novel gene fusions including MKRN1-BRAF, FGFR2-OGDH and SS18-SLC5A11 are reported here. The transcriptomic analysis suggested known drug targets such as FGFRs, VEGFRs, KIT and RET to have low expressions in this cancer; however, through integrative data analysis, I identified the mTOR signaling pathway as a potential therapeutic target for anaplastic thyroid cancer. Molecular analysis of papillary thyroid carcinoma and benign thyroid nodules revealed very low mutation rates in these tumors with CYP1B1, PTPRE, CTSH and RUNX1 emerging as promising diagnostic markers. The key somatic mutations identified in these studies can serve as novel diagnostic markers as well as therapeutic targets.

View record

Algorithms and applications of next-generational DNA sequencing (2012)

Next Generation Sequencing (NGS) technologies enable Deoxyribonucleic Acid (DNA) or Ribonucleic Acid (RNA) sequencing to be done at volumes and speeds several orders of magnitude faster than Sanger (dideoxy termination) based methods and have enabled the development of novel experiment types that would not have been practical before the advent of the NGS-based machines. The dramatically increased throughput of these new protocols requires significant changes to the algorithms used to process and analyze the results. In this thesis, I present novel algorithms used for Chromatin Immunoprecipitation and Sequencing (ChIP-Seq) as well as the structures required and challenges faced for working with Single Nucleotide Variations (SNVs) across a large collection of samples, and finally, I present the results obtained when performing an NGS based analysis of eight mammary ductal carcinoma cell lines and four matched normal cell lines.

View record

Bioinformatic approaches to drug repositioning (2012)

Repositioning existing drugs for new therapeutic uses is an efficient approach to drugdiscovery. However, most successful repositioning cases to date have been serendipitous; thegoal of my thesis was to use computational methods to rationally discover drug repositioningcandidates.I first virtually screened (VS) 4621 drugs against 252 drug targets with molecular docking.This method emphasized removing potential false positives using stringent criteria fromknown interaction docking, consensus scores, and rank information. Published literatureindicated experimental evidence for 31 top predicted interactions, supporting the approach.The chemotherapeutic nilotinib was validated as a potent MAPK14 inhibitor in vitro (IC5040nM), suggesting a potential use in inflammatory diseases.I then applied this method to the cancer target EGFR, predicting the anti-HIV drug tenofovirdisoproxil fumarate (TDF) as a novel inhibitor. In vitro, TDF inhibited the proliferation andEGFR-signaling of an EGFR-overexpressing cell line, but did not inhibit EGFR in directkinase binding assays. This study highlighted limitations of computational and experimentalmethodologies that should be considered when interpreting or designing other studies.We then screened 1,120 off-patent drugs against the triple-negative breast cancer (TNBC)target p90RSK using both VS and high-throughput (HTS) methods. VS predicted a set ofcompounds 26-times enriched for known RSK inhibitors and 11 times enriched for HTS hits,underscoring its efficiency. In secondary screens, the chemotherapeutic ellipticine and thebioflavonoids luteolin and apigenin inhibited RSK activity (IC50 0.50-4.77μM), blocked RSKsignaling, and inhibited TNBC cell proliferation. These drugs thus have potential to berepositioned to TNBC.Finally, we rationally repositioned renal cell carcinoma drugs for a patient with a rare tongueadenocarcinoma. Whole genome and transcriptome sequencing of the patient’s tumor andnormal cells detected sequence, copy number, and expression aberrations, and analysis suggested that the tumor was driven by the RET oncogene. Treatment with RET-inhibitingdrugs stabilized the disease for eight months, after which the disease progressed. We alsosequenced the post-treatment tumor and found changes consistent with acquired therapeuticresistance.Overall, this thesis details two novel high-throughput approaches for drug repositioning:virtual screening of drugs and targets and personalized medicine via sequencing.

View record

De Novo Detection of Regulatory Elements in the Nematode Caenorhabditis elegans (2009)

No abstract available.

Master's Student Supervision (2010 - 2020)
An investigation into the non-coding genomic landscape and effects of chemotherapeutics in pre-treated advanced cancers (2020)

Cancer is a disease which arises due to somatic alterations in the genome. However, most studies on cancer genetics only explore the impact that coding mutations have on the progression of the disease. Furthermore, many genomic inquiries on cancer only implicate primary untreated tumours, which misses the impact of metastasis and treatment. Here we present a cohort of 638 advanced cancer patients with whole genomic, transcriptomic and clinical information. Through this cohort, we attempt to better characterize the non-coding region of metastatic cancers as well as attempt to understand the mutational impact of chemotherapeutics. Using a positional clustering method, we identified 1,567 significant mutational hotspots in the genome. 86 genes were identified as being affected by a hotspot in a regulatory region, including in the TERT promoter, a region with well-known driving mutations. To characterize the biological function of the hotspots, we analyzed the impact of mutation on corresponding gene expression. We show an increased expression for TERT and AP2A1 when their respective promoter regions are mutated, the latter being a novel association. Mutational clusters affecting non-coding RNAs were also examined for any functional impact, but no significant associations were seen. Large non-coding mutational events such as kataegis were seen in multiple cancer types and across all chromosomes. However, little recurrence was seen for kataegis. Additionally, using observed mutational frequencies, we attempt to identify any mutations that may be treatment-induced. Examining the breast, lung, colon and pancreas and ovarian cohorts, we were able to extract known resistance mutations such as ESR1 mutations after aromatase inhibitor treatment and EGFR T790M mutations post anti-EGFR therapy. Further insights are required to confirm the expressional change seen in the cohort. Additional studies to determine AP2A1’s role in cancer would help understand this correlation. Overall, our study shows the presence of important mutations in the non-coding space of metastatic cancers, and the power of whole genome sequencing. Furthermore, we display the need for similar datasets to extrapolate mutations which correlate to resistance.

View record

Support vector machines predict advanced cancer patient response to therapies from bulk RNA sequencing data (2020)

Personalized medicine approaches for cancer therapy seek to determine optimal therapies for cancer patients based on the molecular profile of their tumour. The motivation is to target oncogenomic alterations in tumours with the appropriate therapies. However, it is currently infeasible to determine the optimal therapy simply given the genomic profile of a tumour. There has been significant recent work in attempting to use the computational approach of machine learning for predicting tumour drug response. Machine learning methods have been successfully used for drug response prediction in cancer cell lines and even have been extended to predicting individual cancer patient response to a small number of chemotherapies. This work uses support vector machines (SVM) to predict the response to chemotherapies of 570 advanced cancer patients from the BC Cancer Personalized OncoGenomics program using the transcriptomic profile of their tumours. This dataset of advanced cancers presents over 20 cancer types and 130 unique chemotherapies. F-measures for the SVM predictions were found to be as high as 1.0 for some cohorts. Further analysis on the set of important genes for the SVMs revealed biological explanations that may explain the SVM predictions. This work demonstrates the value of large-scale sequencing projects and the potential of data mining and machine learning in personalized cancer medicine.

View record

Using convolutional neural networks to predict NRG1-fusions in PDAC biopsy images (2020)

Pancreatic ductal adenocarcinoma (PDAC) is considered the most lethal common cancer, with the highest incidence-to-mortality ratio of any solid tumour. Molecular pathology studies and genomic analyses have improved our understandings of how PDAC develops and progresses, and there has been significant progress in treatment strategies for specific genomic alterations. One of these alterations is the NRG1 gene fusion, which has been found to be a rare, but potentially targetable oncogenic driver. To determine whether PDAC patients have an NRG1 gene fusion, we used convolutional neural networks (CNNs) to analyze digital whole slide images (WSIs) of cancer biopsies. In particular, we used histopathological H&E slides from the Personalized OncoGenomics program to train a deep CNN (VGG-16) framework that automatically classifies normal tissue, NRG1-fusion positive tumour tissue, and NRG1-fusion negative tumour tissue. We implemented the model in two-stages, where the first stage classifies normal from tumour tissue, and the second stage classifies the tumour tissue as being NRG1-fusion positive or negative. The model achieved accuracies of 86.5% and 76.0% for each stage, respectively, and an overall accuracy of 68.8%. Additionally, we found that PDAC cases with high expression of the NRG1 gene (93rd-98th percentile of TCGA PDAC cases) were being classified as NRG1-fusion positive, suggesting a possible correlation between NRG1 gene fusions and high parent gene expression. Finally, we attempted to understand the inner workings and decisions of our CNN model by analyzing internal feature maps. We found activation patterns that matched distinct histological features and compared them with a more traditional image segmentation approach. Overall, our findings demonstrate that deep CNNs have the potential to assist pathologists in detecting therapeutically actionable genomic markers.

View record

Using genomic sequencing technology to provide insight into cancer biology and their mechanisms (2020)

Genomic sequencing technology provides insight into cancer pathogenesis and tumoural mechanisms. Tumour RNA sequencing can be used to assess the functionality of genes by allowing for gene expression quantification and transcriptome analysis. Mutational signatures are somatic patterns of mutations arising from specific mutagenic processes such as exogenous and endogenous exposures, defective DNA repair mechanisms or DNA enzymatic editing. Such signatures are “genomic scars” informing on the underlying biological processes that led to cancer. Whole genome sequencing (WGS) of tumour DNA and matched blood DNA as well as whole transcriptome sequencing (WTS) of tumour RNA was performed in advanced cancers of diverse types as part of the Personalized OncoGenomics project. Germline single nucleotide variants (SNVs), copy number variants (CNVs) and structural variants (SVs) in 98 hereditary cancer genes were analyzed from germline WGS data. Somatic SNVs, CNVs and SVs were analyzed from tumour WGS and WTS data. Somatic SNVs profiles were used for mutational signature modelling. Gene expression was obtained from WTS. Transcriptome targeted assembly was performed for transcript splicing analysis. We present specific examples demonstrating the usefulness of combined genomic and bioinformatic approaches for understanding clinically unusual cases of cancer and their molecular mechanisms. We used somatic mutational signature profiling to determine the functional impact of germline and somatic variants in MUTYH, a base excision repair gene, on the overall mutational landscape. In Chapter 2, we present a case series of patients with germline MUTYH variants and diverse cancers. We identified two MUTYH variants for which the previous classification in public databases are inconsistent and we show that these variants cause aberrant splicing and base excision repair deficiency signatures enriched for C:G>A:T transversion mutations. Our results support the pathogenicity of these variants. In Chapter 3, we present the example of comprehensive genomic profiling of a rare and uncharacterized tumour, the eccrine porocarcinoma, in which CDKN2A was identified as a potential novel driver. In both chapters, we used transcriptome targeted assembly to detect and characterize aberrant splicing due to selected germline and somatic variants of interest. Supplementary materials available at: http://hdl.handle.net/2429/73595.

View record

Characterization of the human thyroid epigenome (2017)

The thyroid gland, necessary for normal human growth and development, is essential for the regulation of metabolism. Its function – to produce and secrete appropriate levels of thyroid hormone – is simple; however accurate assessment of thyroid abnormality is challenging and a fundamental understanding of the normal thyroid is therefore needed. One way to characterize the normal functioning of the thyroid gland is to study the epigenome and resulting transcriptome within its constituent cells. In this study, we compare the consistency of chromatin state annotations across the epigenomes from the grossly uninvolved tumour-adjacent thyroid tissue of four human individuals using ChIP-seq and RNA-seq. We profile four activating (H3K4me1, H3K4me3, H3K27ac, H3K36me3) and two repressing (H3K9me3, H3K27me3) histone modifications, identify chromatin states using a hidden Markov model, produce a novel metric for model selection, and establish epigenomic maps of 19 chromatin states. We found that epigenetic features characterizing promoters and transcription elongation tend to more consistent across epigenomes and that epigenetically active genes consistent across all epigenomes tend to have higher expression than those that are not marked as epigenetically active in all samples. We also identified a set of 18 genes epigenetically active and consistently expressed in the thyroid that are likely relevant to thyroid function. Altogether, we believe the epigenomes presented in this work represent a useful resource to gain a deeper understanding of the underlying molecular biology of thyroid function and provide contextual information of thyroid and human epigenomic data for comparison and integration into future studies.

View record

Latent Semantic Analysis for retrieving related biomedical articles (2017)

Retrieving relevant scientific papers in a scalable way is increasingly important, as more and more studies are published. PubMed’s relevant article recommendation is based on MeSH assignments by indexers, which requires significant human resources and can become a limitation in making papers searchable. Many recommendation systems use singular value decomposition (SVD) to pre-compute related products. In this study, we look at using latent semantic analysis (LSA), an application of SVD to determine relationships in a set of documents and terms, to find related biomedical papers. We focused on determining the best parameters for SVD in retrieving relevant biomedical articles given a paper of interest. Using PubMed's recommendations as guidance, we found that using cosine distance to measure document similarity leads to better results than using Euclidean distance. We re-evaluated other parameters, including the weighting scheme and the number of singular values and using a larger abstract corpus. Finally, we asked people to compare the relevant abstract retrieved with our method against those retrieved by PubMed. Our method retrieved sensible articles that were chosen over PubMed's relevant papers one-third of the time. We looked into the abstracts retrieved by either method and discuss possible areas for experimentation and improvement.

View record

Bioinformatics approach to investigate genetic differences underlying breast tumours with specific outcomes of adoptive T-cell therapy using a mouse model (2010)

The immune system plays a critical role in cancer prevention and development. The stimulation of natural immune reaction in a cancer patient by adoptive T-cell therapy has shown success in treating metastatic melanomas and renal cell carcinomas. However, the use of adoptive T-cell therapy remains limited due to unpredictable outcomes and low response rates. In particular, adoptive T-cell therapy for breast cancer has not been realized, despite of the presence of immunogenic antigens such as over-expressed HER2, present in 20-40% of breast tumours. Using a unique transgenic mouse model, the global profiles of gene expression, miRNA abundance and single nucleotide variants (SNVs) were investigated to identify the molecular difference of murine mammary tumours with isogenic background, which exhibited complete regression (CR), partial regression (PR) or progressive disease (PD) outcome of adoptive T-cell therapy. The bioinformatics analyses were further carried out to identify uniquely activated pathways, prognostic gene expression signatures, the effect of post-transcriptional gene regulation and mutated genes unique to tumours with specific outcome. The largest differences in gene expression, miRNA and SNV profiles were repeatedly observed between the regressing (CR, PR) and non-regressing (PD) tumours, supporting the attribution of molecular differences to the immunotherapy outcome. In particular, the gene expression signatures derived from genes in immune-related pathways were experimentally validated to be strong prognostic markers for predicting the CR outcome. Comparison with the human breast cancer subtypes further revealed similarities of the non-regressing tumours with the basal subtype, and the regressing tumours with the HER2 subtype. The difference in miRNA profiles between CR and PR tumours suggested potential translational activities unique to PR, which was nearly identical to CR at the transcriptome level. The findings from this study show that tumour-derivied factors that either promote or suppress the immune system are responsible for the varying outcome of immunotherapy, and that the molecular characteristics can be further applied for the development of clinical prognostic tools, cancer vaccines and drug targets to enhance the efficacy of adoptive T-cell therapy.

View record

 
 

If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.

 
 

Follow these steps to apply to UBC Graduate School!