Abstract
To detect functional somatic mutations in tumor samples, whole-exome sequencing (WES) is often used for its reliability and relative low cost. RNA-seq, while generally used to measure gene expression, can potentially also be used for identification of somatic mutations. However there has been little systematic evaluation of the utility of RNA-seq for identifying somatic mutations. Here, we develop and evaluate a pipeline for processing RNA-seq data from glioblastoma multiforme (GBM) tumors in order to identify somatic mutations. The pipeline entails the use of the STAR aligner 2-pass procedure jointly with MuTect2 from genome analysis toolkit (GATK) to detect somatic variants. Variants identified from RNA-seq data were evaluated by comparison against the COSMIC and dbSNP databases, and also compared to somatic variants identified by exome sequencing. We also estimated the putative functional impact of coding variants in the most frequently mutated genes in GBM. Interestingly, variants identified by RNA-seq alone showed better representation of GBM-related mutations cataloged by COSMIC. RNA-seq-only data substantially outperformed the ability of WES to reveal potentially new somatic mutations in known GBM-related pathways, and allowed us to build a high-quality set of somatic mutations common to exome and RNA-seq calls. Using RNA-seq data in parallel with WES data to detect somatic mutations in cancer genomes can thus broaden the scope of discoveries and lend additional support to somatic variants identified by exome sequencing alone.
Highlights
Cancer is among the leading causes of death worldwide, with 8.7 million deaths in 2015 (Global Burden of Disease Cancer Collaboration, 2017)
RNA-seq data from the Cancer Genome Atlas (TCGA) samples showed low numbers of variants excluded by this filter (Fig. 2C), which could be due to coverage differences between tumor RNA-seq and matched-normal
COSMIC/dbSNP overlap can be used as an indicator of the somatic/germline content in TCGA samples For each of the three classes of variants—whole-exome sequencing (WES)-only, Intersection and RNA-seq-only—we examined the proportion in different genomic regions (Fig. 3C), potential for affecting protein function (Fig. 3D) and representation in dbSNP and COSMIC databases (Fig. 3E)
Summary
Cancer is among the leading causes of death worldwide, with 8.7 million deaths in 2015 (Global Burden of Disease Cancer Collaboration, 2017). Cancers are driven in part by the accumulation of somatic mutations, which incidentally, offer targets for new precision therapies directed against tumor-causing mutations (The Cancer Genome Atlas Research Network et al, 2013; Yu, O’Toole & Trent, 2015). Advances in next-generation sequencing technologies have allowed increasingly fast, accurate and cost-efficient analysis of DNA and RNA samples, which has driven the identification of key cancer-driving mutations (Raphael et al, 2014). These findings are beginning to pave the way for new targeted therapies in many cancers, but significant challenges remain (Paez et al, 2004; Taylor, Furnari & Cavenee, 2012)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have