Abstract

BackgroundAccurate detection of somatic mutations is challenging but critical in understanding cancer formation, progression, and treatment. We recently proposed NeuSomatic, the first deep convolutional neural network-based somatic mutation detection approach, and demonstrated performance advantages on in silico data.ResultsIn this study, we use the first comprehensive and well-characterized somatic reference data sets from the SEQC2 consortium to investigate best practices for using a deep learning framework in cancer mutation detection. Using the high-confidence somatic mutations established for a cancer cell line by the consortium, we identify the best strategy for building robust models on multiple data sets derived from samples representing real scenarios, for example, a model trained on a combination of real and spike-in mutations had the highest average performance.ConclusionsThe strategy identified in our study achieved high robustness across multiple sequencing technologies for fresh and FFPE DNA input, varying tumor/normal purities, and different coverages, with significant superiority over conventional detection approaches in general, as well as in challenging situations such as low coverage, low variant allele frequency, DNA damage, and difficult genomic regions

Highlights

  • Accurate detection of somatic mutations is challenging but critical in understanding cancer formation, progression, and treatment

  • In contrast to earlier efforts to establish benchmarking data for somatic mutation detection [4,5,6,7], this dataset has been well-characterized by the SEQC2 consortium and the truth set was defined with a comprehensive process yielding a high validation rate (>99.9%) [3] and used in benchmarking the performance of whole-genome sequencing (WGS) and whole exome sequencing (WES) studies [2]

  • For a broad assessment of consistency and reproducibility of predictions, we used a total of 119 replicates from diverse data sets representing realistic cancer detection applications including real whole-genome sequencing (WGS), whole exome sequencing (WES), and AmpliSeq targeted sequencing

Read more

Summary

Introduction

Accurate detection of somatic mutations is challenging but critical in understanding cancer formation, progression, and treatment. Accurate somatic mutation detection enables precise diagnosis, prognosis, and treatment of cancer patients [1]. In contrast to earlier efforts to establish benchmarking data for somatic mutation detection [4,5,6,7], this dataset has been well-characterized by the SEQC2 consortium and the truth set was defined with a comprehensive process yielding a high validation rate (>99.9%) [3] and used in benchmarking the performance of WGS and WES studies [2]. Derived from the first comprehensive and well-characterized paired tumor-normal reference cancer samples, this data set along with the accompanying sequencing data prepared at multiple sites with multiple technologies provides a unique resource for learning-based somatic mutation detection techniques

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call