VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering.

András Gézsi,Bence Bolgár,Péter Marx,Peter Sarkozy,Péter Antal,Csaba Szalai

doi:10.1186/s12864-015-2050-y

András Gézsi, Bence Bolgár + Show 4 more

Open Access

https://doi.org/10.1186/s12864-015-2050-y

Copy DOI

Abstract

BackgroundThe low concordance between different variant calling methods still poses a challenge for the wide-spread application of next-generation sequencing in research and clinical practice. A wide range of variant annotations can be used for filtering call sets in order to improve the precision of the variant calls, but the choice of the appropriate filtering thresholds is not straightforward. Variant quality score recalibration provides an alternative solution to hard filtering, but it requires large-scale, genomic data.ResultsWe evaluated germline variant calling pipelines based on BWA and Bowtie 2 aligners in combination with GATK UnifiedGenotyper, GATK HaplotypeCaller, FreeBayes and SAMtools variant callers, using simulated and real benchmark sequencing data (NA12878 with Illumina Platinum Genomes). We argue that these pipelines are not merely discordant, but they extract complementary useful information.We introduce VariantMetaCaller to test the hypothesis that the automated fusion of measurement related information allows better performance than the recommended hard-filtering settings or recalibration and the fusion of the individual call sets without using annotations. VariantMetaCaller uses Support Vector Machines to combine multiple information sources generated by variant calling pipelines and estimates probabilities of variants.This novel method had significantly higher sensitivity and precision than the individual variant callers in all target region sizes, ranging from a few hundred kilobases to whole exomes. We also demonstrated that VariantMetaCaller supports a quantitative, precision based filtering of variants under wider conditions. Specifically, the computed probabilities of the variants can be used to order the variants, and for a given threshold, probabilities can be used to estimate precision. Precision then can be directly translated to the number of true called variants, or equivalently, to the number of false calls, which allows finding problem-specific balance between sensitivity and precision.ConclusionsVariantMetaCaller can be applied to small target regions and whole exomes as well, and it can be used in cases of organisms for which highly accurate variant call sets are not yet available, therefore it can be a viable alternative to hard filtering in cases where variant quality score recalibration cannot be used. VariantMetaCaller is freely available at http://bioinformatics.mit.bme.hu/VariantMetaCaller.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2050-y) contains supplementary material, which is available to authorized users.

Highlights

The low concordance between different variant calling methods still poses a challenge for the wide-spread application of next-generation sequencing in research and clinical practice
Results on simulated sequencing data We created synthetic sequencing data with known variations in the reference genome to compare the performance of previous variant calling pipelines to that of our method
As we show in the supplementary results, the precision of all individual variant callers was relatively high for Single nucleotide polymorphism (SNP) (> 0.99) as opposed to the precision for indels (0.6 − 0.95)

Summary

Introduction

The low concordance between different variant calling methods still poses a challenge for the wide-spread application of next-generation sequencing in research and clinical practice. To further improve the sensitivity of the pipeline, one can use multiple variant calling methods, as it is a well-known fact that different callers produce different results [1, 3,4,5,6,7]. The rationale behind this practice is that the consequence of a false negative variant call (i.e. not discovering a true variant) is usually more serious than the consequence of a false positive (i.e. unreal variant claimed to be real), especially in clinical settings. An application-specific balance between sensitivity and precision is needed

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Oct 28, 2015
Citations: 58	License type: cc-by

R Discovery Prime

R Discovery Prime

VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Appreci8: a pipeline for precise variant calling integrating 8 tools.
Sarah Sandmann ... John Hancock
Computer applications in the biosciences : CABIOS | VOL. 34
Sarah Sandmann, et. al.Sarah Sandmann ... John Hancock
26 Jun 2018
Computer applications in the biosciences : CABIOS | VOL. 34

Abstract PD3-4: Reliability of whole exome sequencing for assessing intratumor heterogeneity from breast tumor biopsies
Weiwei Shi ... Brigid Killelea
American Journal of Cancer | VOL. 75
Weiwei Shi, et. al.Weiwei Shi ... Brigid Killelea
30 Apr 2015
American Journal of Cancer | VOL. 75

Cross-Comparison of Exome Analysis, Next-Generation Sequencing of Amplicons, and the iPLEX(®) ADME PGx Panel for Pharmacogenomic Profiling.
Eng Wee Chua ... Simone L Cree
Frontiers in Pharmacology | VOL. 7
Eng Wee Chua, et. al.Eng Wee Chua ... Simone L Cree
26 Jan 2016
Frontiers in Pharmacology | VOL. 7

Comparison of GATK and DeepVariant by trio sequencing
Yi-Lin Lin ... Ni-Chung Lee
Scientific Reports | VOL. 12
Yi-Lin Lin, et. al.Yi-Lin Lin ... Ni-Chung Lee
02 Feb 2022
Scientific Reports | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics