Abstract

Abstract Cancer genomes are known to harbor a wide range of mutations, including complex variants with combination of insertion and deletion (InDels). Next generation sequencing (NGS) has revolutionized our understanding of mutations in cancer. Most of current variant callers from NGS focused on single nucleotide variants (SNV) or short InDels. However, detection of complex variants remains a challenge and off limits to most current variant callers. In addition, efficient analysis of ultra-deep targeted sequencing without downsampling for low frequency mutations in heterogeneous cancer samples, which is becoming more routine, is also a challenge and not handled well by most current variant callers. Here, we describe VarDict, a novel and versatile variant caller for ultra-deep targeted deep sequencing, exome, whole genome, and RNA-seq. VarDict is designed for heterogeneous cancer genomes and is able to simultaneously call SNVs (Single-Nucleotide Variants), MNVs (Multiple-Nucleotide Variants), InDels (user-defined sizes), and complex variants (combination of aforementioned events). VarDict handles ultra-deep sequencing of runs up to mean coverage of 1M without down-sampling or significant loss of performance. It performs local realignment around InDels on the fly and rescues soft-clipped reads for more accurate estimation of allele frequencies as well as allowing calls of InDels only supported by soft-clipped reads. In addition, it performs amplicon aware variant calling for PCR-based targeted sequencing by avoiding calling variants in PCR primers, discounting primer depths, and importantly, detecting variants with amplicon-bias, a common artifact for PCR based targeted sequencing. VarDict can also be run in paired mode to identify somatic or LOH variants, as well as variants whose allele frequencies have shifted significantly. It thus enables paired DNA-seq and RNA-seq variant calling that most variant callers do not handle well. To demonstrate the value of VarDict in practice, we applied VarDict on the WGS of NA12878, and compared the result to the calls made in Genome In A Bottle (GiaB). VarDict was able to call >96% of variants in GiaB. In addition, it found many more variants likely missed by GiaB, especially complex variants, many of which were never categorized before. We further applied VarDict in the ICGC-TCGA DREAM Mutation Calling challenge (syn312572). We found it to be as sensitive as the more commonly used somatic SNP callers like MuTect, Freebayes, and VarScan in calling SNPs but more sensitive in calling InDels than an array of other variant callers, including VarScan, FreeBayes, and Scalpel. VarDict is fully open source, implemented in Perl, and uses memory efficiently, regardless of the depth, making it a HPC cluster friendly tool. VarDict has further been integrated into bcbio-nextgen, an open source framework for scalable NGS analysis for ease of deployment. VarDict is freely available in GitHub (https://github.com/AstraZeneca-NGS/VarDict). Citation Format: Zhongwu Lai, Aleksandra Markovets, Miika Ahdesmaki, Justin Johnson. VarDict: A novel and versatile variant caller for next-generation sequencing in cancer research. [abstract]. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr 4864. doi:10.1158/1538-7445.AM2015-4864

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.