A comparative analysis of algorithms for somatic SNV detection in cancer

Nicola D Roberts,Wendy T Parker,Susan Branford,Hamish S Scott,David L Adelson,Garique Glonek,Andreas W Schreiber,R Daniel Kortschak

doi:10.1093/bioinformatics/btt375

Nicola D Roberts, Wendy T Parker + Show 6 more

Open Access

https://doi.org/10.1093/bioinformatics/btt375

Copy DOI

Abstract

Motivation: With the advent of relatively affordable high-throughput technologies, DNA sequencing of cancers is now common practice in cancer research projects and will be increasingly used in clinical practice to inform diagnosis and treatment. Somatic (cancer-only) single nucleotide variants (SNVs) are the simplest class of mutation, yet their identification in DNA sequencing data is confounded by germline polymorphisms, tumour heterogeneity and sequencing and analysis errors. Four recently published algorithms for the detection of somatic SNV sites in matched cancer–normal sequencing datasets are VarScan, SomaticSniper, JointSNVMix and Strelka. In this analysis, we apply these four SNV calling algorithms to cancer–normal Illumina exome sequencing of a chronic myeloid leukaemia (CML) patient. The candidate SNV sites returned by each algorithm are filtered to remove likely false positives, then characterized and compared to investigate the strengths and weaknesses of each SNV calling algorithm.Results: Comparing the candidate SNV sets returned by VarScan, SomaticSniper, JointSNVMix2 and Strelka revealed substantial differences with respect to the number and character of sites returned; the somatic probability scores assigned to the same sites; their susceptibility to various sources of noise; and their sensitivities to low-allelic-fraction candidates.Availability: Data accession number SRA081939, code at http://code.google.com/p/snv-caller-review/Contact: david.adelson@adelaide.edu.auSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

Cancer genome projects are currently working to catalogue the diversity of DNA mutations present in different cancers via highthroughput DNA sequencing of matched cancer–normal samples
Using low probability score thresholds for inclusion to generate large candidate sets, the raw output consisted of 2667 somatic and 1720 LOH VarScan candidates; 2663 somatic and 175 LOH SomaticSniper candidates; 2178 somatic and 2040 LOH JSM2 candidates; and 438 somatic and 29 LOH Strelka candidates
Comparing the candidate single nucleotide variants (SNVs) sets returned by VarScan, SomaticSniper, JSM2 and Strelka revealed substantial differences as to the number and character of sites returned; the somatic probability scores assigned to the same sites; their susceptibility to various sources of noise; and in their differing sensitivities to candidate mutations at a low allelic fraction

Summary

INTRODUCTION

Cancer genome projects are currently working to catalogue the diversity of DNA mutations present in different cancers via highthroughput DNA sequencing of matched cancer–normal samples. Analysis of cancer sequencing data has unique challenges, including: methods for analysing matched cancer–normal samples to distinguish germline polymorphism from somatic variation; genome rearrangements that do not align well to the reference; and cancer sample heterogeneity from subclonal variation and sample impurity (Ding et al, 2010; Gundry and Vijg, 2012; Meyerson et al, 2010). In addition to this biological complexity are several sources of mapping and sequencing error, both random and systematic. A significant problem in cancer sequencing, as subclonal variation and sample impurity give rise to mutations at the same low allelic fractions as aggregations of systematic error

SOMATIC SNV DETECTION

Variant calling algorithms

Filtering candidate SNV sites

RESULTS

Raw output

Comparison and characterization of candidate sites

Non-cancer exomes

CONCLUSIONS AND FUTURE PERSPECTIVES

LOH candidates

Somatic candidates

Understanding the molecular basis of cancer

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics	Publication Date: Jul 9, 2013
Citations: 92	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

A comparative analysis of algorithms for somatic SNV detection in cancer

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

SNVSniffer: An integrated caller for germline and somatic SNVs based on Bayesian models
Yongchao Liu ... Srinivas Aluru
-
Yongchao Liu, et. al.Yongchao Liu ... Srinivas Aluru
01 Nov 2015
01 Nov 2015

GeDi: applying suffix arrays to increase the repertoire of detectable SNVs in tumour genomes
Izaak Coleman ... Luca Magnani
BMC Bioinformatics | VOL. 21
Izaak Coleman, et. al.Izaak Coleman ... Luca Magnani
05 Feb 2020
BMC Bioinformatics | VOL. 21

Somatic single nucleotide variations and copy number variation can be used to distinguish high grade serous ovarian cancer from benign fallopian tubes with high accuracy (219)
Nicholas Cardillo ... Michael Goodheart
Gynecologic Oncology | VOL. 166
Nicholas Cardillo, et. al.Nicholas Cardillo ... Michael Goodheart
01 Aug 2022
Gynecologic Oncology | VOL. 166

Abstract 5296: R2D2: An integrated analysis framework to infer the functional impact of single nucleotide variants (SNVs) using matched germline and tumor DNA and RNA sequencing data
Alma Imamovic ... Saud H Aldubayan
Cancer Research | VOL. 78
Alma Imamovic, et. al.Alma Imamovic ... Saud H Aldubayan
01 Jul 2018
Cancer Research | VOL. 78

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A comparative analysis of algorithms for somatic SNV detection in cancer

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics