SiNPle: Fast and Sensitive Variant Calling for Deep Sequencing Data.

Luca Ferretti,Chandana Tennakoon,Graham Freimanis,Paolo Ribeca,Adrian Silesian

doi:10.3390/genes10080561

Abstract

Current high-throughput sequencing technologies can generate sequence data and provide information on the genetic composition of samples at very high coverage. Deep sequencing approaches enable the detection of rare variants in heterogeneous samples, such as viral quasi-species, but also have the undesired effect of amplifying sequencing errors and artefacts. Distinguishing real variants from such noise is not straightforward. Variant callers that can handle pooled samples can be in trouble at extremely high read depths, while at lower depths sensitivity is often sacrificed to specificity. In this paper, we propose SiNPle (Simplified Inference of Novel Polymorphisms from Large coveragE), a fast and effective software for variant calling. SiNPle is based on a simplified Bayesian approach to compute the posterior probability that a variant is not generated by sequencing errors or PCR artefacts. The Bayesian model takes into consideration individual base qualities as well as their distribution, the baseline error rates during both the sequencing and the PCR stage, the prior distribution of variant frequencies and their strandedness. Our approach leads to an approximate but extremely fast computation of posterior probabilities even for very high coverage data, since the expression for the posterior distribution is a simple analytical formula in terms of summary statistics for the variants appearing at each site in the genome. These statistics can be used to filter out putative SNPs and indels according to the required level of sensitivity. We tested SiNPle on several simulated and real-life viral datasets to show that it is faster and more sensitive than existing methods. The source code for SiNPle is freely available to download and compile, or as a Conda/Bioconda package.

Highlights

Detection of low-frequency variants is an important area in the downstream analysis of high-throughput sequencing
Study of genetic variation in heterogeneous samples is another research area that has been facilitated by recent technological advances, making it possible to generate high coverage data that enable deep sequencing and detection of low-frequency variants
In a Bayesian context, the posterior probability of true and false variants at a given site can be approximated by the product of marginal probabilities up to a factor 1 + ∑i O( f i ) where f i are the frequencies of the minor variants

Summary

Introduction

Detection of low-frequency variants is an important area in the downstream analysis of high-throughput sequencing. In cancer studies, it can provide means of detecting circulating cancer cells and be helpful in the early diagnosis and prognosis, or to detect relapse. It can provide means of detecting circulating cancer cells and be helpful in the early diagnosis and prognosis, or to detect relapse It is useful for the study of DNA populations, for example to analyse cancer heterogeneity and the evolution of viral quasi-species [1]. Study of genetic variation in heterogeneous samples is another research area that has been facilitated by recent technological advances, making it possible to generate high coverage data that enable deep sequencing and detection of low-frequency variants. In scenarios involving targeted sequencing, Genes 2019, 10, 561; doi:10.3390/genes10080561 www.mdpi.com/journal/genes

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Genes	Publication Date: Jul 25, 2019
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

SiNPle: Fast and Sensitive Variant Calling for Deep Sequencing Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genes

Lead the way for us

Similar Papers

MetaCNV - a consensus approach to infer accurate copy numbers from low coverage data
Stefanie Friedrich ... Thomas Helleday
BMC Medical Genomics | VOL. 13
Stefanie Friedrich, et. al.Stefanie Friedrich ... Thomas Helleday
01 Jun 2020
BMC Medical Genomics | VOL. 13

Simulation of African and non-African low and high coverage whole genome sequence data to assess variant calling approaches
Shatha Alosaimi ... Emile R Chimusa
Briefings in Bioinformatics | VOL. 22
Shatha Alosaimi, et. al.Shatha Alosaimi ... Emile R Chimusa
21 Dec 2020
Briefings in Bioinformatics | VOL. 22

GINDEL: accurate genotype calling of insertions and deletions from low coverage population sequence reads.
Chong Chu ... Jin Zhang
PLoS ONE | VOL. 9
Chong Chu, et. al.Chong Chu ... Jin Zhang
25 Nov 2014
PLoS ONE | VOL. 9

A hybrid correcting method considering heterozygous variations by a comprehensive probabilistic model
Jiaqi Liu ... Xiao Xiao
BMC Genomics | VOL. 21
Jiaqi Liu, et. al.Jiaqi Liu ... Xiao Xiao
01 Nov 2020
BMC Genomics | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SiNPle: Fast and Sensitive Variant Calling for Deep Sequencing Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genes