Abstract
Subclonal mutations reveal important features of the genetic architecture of tumors. However, accurate detection of mutations in genetically heterogeneous tumor cell populations using next-generation sequencing remains challenging. We develop MuSE (http://bioinformatics.mdanderson.org/main/MuSE), Mutation calling using a Markov Substitution model for Evolution, a novel approach for modeling the evolution of the allelic composition of the tumor and normal tissue at each reference base. MuSE adopts a sample-specific error model that reflects the underlying tumor heterogeneity to greatly improve the overall accuracy. We demonstrate the accuracy of MuSE in calling subclonal mutations in the context of large-scale tumor sequencing projects using whole exome and whole genome sequencing.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-016-1029-6) contains supplementary material, which is available to authorized users.
Highlights
The detection of somatic point mutations is a key component of cancer genomic research that has been rapidly developing since next-generation sequencing (NGS) technology revealed its potential for describing genetic alterations in cancer [1,2,3,4,5,6]
The first step, ‘MuSE call’, implements the heuristic pre-filters and uses the Markov substitution model to describe the evolution of the reference allele to the allelic composition of the matched tumor and normal tissue at each genomic locus, which provides the summary statistics π . somatic The πsomatic,tumor associated receiver operating characteristic (ROC) curve is shown to stand above that from Caller A, suggesting a good ability to discriminate mutations from references of the MuSE pipeline
The second step, ‘MuSE sump’, identifies tier-based cutoffs on π . somatic,tumor We build a samplespecific error model to account for tumor heterogeneity and to identify cutoffs that are unique to each sample, achieving high accuracy in mutation calling
Summary
The detection of somatic point mutations is a key component of cancer genomic research that has been rapidly developing since next-generation sequencing (NGS) technology revealed its potential for describing genetic alterations in cancer [1,2,3,4,5,6]. As the cost of NGS has decreased, the need to thoroughly interrogate the cancer genome has spurred the migration from using whole exome sequencing (WES) to whole genome sequencing (WGS). The sequencing depth decreases from 100 − 200× for WES. Another nontrivial difficulty is accounting for the influence of tumor heterogeneity that is commonly observed in mutation calling. The presence of both normal cells and tumor subclones in the sample causes this phenomenon to vary from sample to sample [7, 8]. Tier-based variant call sets that inherently attach uncertainties will be helpful when evaluating the behavior of low variant allele fraction (VAF) mutations and seeking to understand the effect of tumor heterogeneity
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.