Abstract
BackgroundNext-generation sequencing of matched tumor and normal biopsy pairs has become a technology of paramount importance for precision cancer treatment. Sequencing costs have dropped tremendously, allowing the sequencing of the whole exome of tumors for just a fraction of the total treatment costs. However, clinicians and scientists cannot take full advantage of the generated data because the accuracy of analysis pipelines is limited. This particularly concerns the reliable identification of subclonal mutations in a cancer tissue sample with very low frequencies, which may be clinically relevant.ResultsUsing simulations based on kidney tumor data, we compared the performance of nine state-of-the-art variant callers, namely deepSNV, GATK HaplotypeCaller, GATK UnifiedGenotyper, JointSNVMix2, MuTect, SAMtools, SiNVICT, SomaticSniper, and VarScan2. The comparison was done as a function of variant allele frequencies and coverage. Our analysis revealed that deepSNV and JointSNVMix2 perform very well, especially in the low-frequency range. We attributed false positive and false negative calls of the nine tools to specific error sources and assigned them to processing steps of the pipeline. All of these errors can be expected to occur in real data sets. We found that modifying certain steps of the pipeline or parameters of the tools can lead to substantial improvements in performance. Furthermore, a novel integration strategy that combines the ranks of the variants yielded the best performance. More precisely, the rank-combination of deepSNV, JointSNVMix2, MuTect, SiNVICT and VarScan2 reached a sensitivity of 78% when fixing the precision at 90%, and outperformed all individual tools, where the maximum sensitivity was 71% with the same precision.ConclusionsThe choice of well-performing tools for alignment and variant calling is crucial for the correct interpretation of exome sequencing data obtained from mixed samples, and common pipelines are suboptimal. We were able to relate observed substantial differences in performance to the underlying statistical models of the tools, and to pinpoint the error sources of false positive and false negative calls. These findings might inspire new software developments that improve exome sequencing pipelines and further the field of precision cancer treatment.
Highlights
Next-generation sequencing of matched tumor and normal biopsy pairs has become a technology of paramount importance for precision cancer treatment
To address the aformentioned challenges, here we identify error sources in variant calling arising from the bioinformatics pipeline for read alignment and processing
We found the predictions of deepSNV and JointSNVMix2 to be highly robust against varying thresholds for base and mapping qualities (Fig. 4b red and blue)
Summary
Next-generation sequencing of matched tumor and normal biopsy pairs has become a technology of paramount importance for precision cancer treatment. Clinicians and scientists cannot take full advantage of the generated data because the accuracy of analysis pipelines is limited. This concerns the reliable identification of subclonal mutations in a cancer tissue sample with very low frequencies, which may be clinically relevant. Even very low-frequency variants may be critical for treatment outcome, because (i) it may be sufficient if a small portion of the cells promotes tumor growth, e.g. by producing a growth factor and (ii) drug resistance mutations may already be present in small subclones that expand upon treatment. Detecting even rare mutations is crucial for improving therapy
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.