Abstract

Motivation: With the advent of relatively affordable high-throughput technologies, DNA sequencing of cancers is now common practice in cancer research projects and will be increasingly used in clinical practice to inform diagnosis and treatment. Somatic (cancer-only) single nucleotide variants (SNVs) are the simplest class of mutation, yet their identification in DNA sequencing data is confounded by germline polymorphisms, tumour heterogeneity and sequencing and analysis errors. Four recently published algorithms for the detection of somatic SNV sites in matched cancer–normal sequencing datasets are VarScan, SomaticSniper, JointSNVMix and Strelka. In this analysis, we apply these four SNV calling algorithms to cancer–normal Illumina exome sequencing of a chronic myeloid leukaemia (CML) patient. The candidate SNV sites returned by each algorithm are filtered to remove likely false positives, then characterized and compared to investigate the strengths and weaknesses of each SNV calling algorithm.Results: Comparing the candidate SNV sets returned by VarScan, SomaticSniper, JointSNVMix2 and Strelka revealed substantial differences with respect to the number and character of sites returned; the somatic probability scores assigned to the same sites; their susceptibility to various sources of noise; and their sensitivities to low-allelic-fraction candidates.Availability: Data accession number SRA081939, code at http://code.google.com/p/snv-caller-review/Contact: david.adelson@adelaide.edu.auSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

  • Cancer genome projects are currently working to catalogue the diversity of DNA mutations present in different cancers via highthroughput DNA sequencing of matched cancer–normal samples

  • Using low probability score thresholds for inclusion to generate large candidate sets, the raw output consisted of 2667 somatic and 1720 LOH VarScan candidates; 2663 somatic and 175 LOH SomaticSniper candidates; 2178 somatic and 2040 LOH JSM2 candidates; and 438 somatic and 29 LOH Strelka candidates

  • Comparing the candidate single nucleotide variants (SNVs) sets returned by VarScan, SomaticSniper, JSM2 and Strelka revealed substantial differences as to the number and character of sites returned; the somatic probability scores assigned to the same sites; their susceptibility to various sources of noise; and in their differing sensitivities to candidate mutations at a low allelic fraction

Read more

Summary

INTRODUCTION

Cancer genome projects are currently working to catalogue the diversity of DNA mutations present in different cancers via highthroughput DNA sequencing of matched cancer–normal samples. Analysis of cancer sequencing data has unique challenges, including: methods for analysing matched cancer–normal samples to distinguish germline polymorphism from somatic variation; genome rearrangements that do not align well to the reference; and cancer sample heterogeneity from subclonal variation and sample impurity (Ding et al, 2010; Gundry and Vijg, 2012; Meyerson et al, 2010). In addition to this biological complexity are several sources of mapping and sequencing error, both random and systematic. A significant problem in cancer sequencing, as subclonal variation and sample impurity give rise to mutations at the same low allelic fractions as aggregations of systematic error

SOMATIC SNV DETECTION
Variant calling algorithms
Filtering candidate SNV sites
RESULTS
Raw output
Comparison and characterization of candidate sites
Non-cancer exomes
CONCLUSIONS AND FUTURE PERSPECTIVES
LOH candidates
Somatic candidates
Understanding the molecular basis of cancer
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call