Abstract Accurate identification of somatic mutations is key to targeted cancer therapies. However, due to the complexity of next-generation sequencing, there are many variables that give rise to sequencing and alignment artifacts. Most sequencing workflows using tumor normal paired samples rely on the intersection or union of mutations by more than one somatic variant caller such as Mutect2, Strelka, VarScan2, Muse, and SomaticSniper. Yet, they still contain a number of false positive and false negative calls. Therefore, there is a need for a manual review of variants using Integrative Genomics Viewer (IGV) to incorporate information that is difficult to take into account in automated workflows. To alleviate the burden of manual inspection, we present an automated classification method to pre classify detected somatic variants into true or false classes for further manual review. Methods: We created a false positive classifier that utilizes automated IGV visualization of somatic variants. Our somatic workflow involves tumor normal DNA sequencing, BWAMEM alignment, deduplication of reads via Picard, somatic calling using two somatic callers, followed by the union of the detected variants, SnpEff annotation, and extracting nonsilent variants with variant allele fractions greater than 5 percent. We used the data for 1,413 nonsilent variants from a set of 10 tumor normal pairs of various cancer types. The variants were scored by cancer analysts using IGV. Initial false positive rate was 28 percent. We fine tuned Residual Deep Neural Network which utilizes weights from the network with the same architecture trained on ImageNet. Discriminative fine tuning was performed in two stages: first unfreezing the last fully connected layer only, and then unfreezing all the layers and training with differential learning rates. Results and Conclusion: Our method achieves 93 percent overall accuracy. The classifier greatly reduces the false positive rate for the set of variant calls from 28 percent down to 4 percent at the expense of calling 3 percent of true variants false. Previously, deep neural networks have been shown to yield the state of the art performance in multiple domains including image processing, speech, and text processing. In this work, we describe a way to fine tune Residual Neural Network trained on a large image data set to perform false positive variant detection in tumor-normal variant alignment snapshots from IGV. Our general approach that significantly reduces false positive rate of putative variants can be extended to any subset of somatic callers. Citation Format: Alena S. Harley, Corine K. Lau, Eve Shinbrot. Automated somatic variant classifier to reduce false positives identified by tumor normal variant callers [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 2474.
Read full abstract