Abstract

BackgroundThe low concordance between different variant calling methods still poses a challenge for the wide-spread application of next-generation sequencing in research and clinical practice. A wide range of variant annotations can be used for filtering call sets in order to improve the precision of the variant calls, but the choice of the appropriate filtering thresholds is not straightforward. Variant quality score recalibration provides an alternative solution to hard filtering, but it requires large-scale, genomic data.ResultsWe evaluated germline variant calling pipelines based on BWA and Bowtie 2 aligners in combination with GATK UnifiedGenotyper, GATK HaplotypeCaller, FreeBayes and SAMtools variant callers, using simulated and real benchmark sequencing data (NA12878 with Illumina Platinum Genomes). We argue that these pipelines are not merely discordant, but they extract complementary useful information.We introduce VariantMetaCaller to test the hypothesis that the automated fusion of measurement related information allows better performance than the recommended hard-filtering settings or recalibration and the fusion of the individual call sets without using annotations. VariantMetaCaller uses Support Vector Machines to combine multiple information sources generated by variant calling pipelines and estimates probabilities of variants.This novel method had significantly higher sensitivity and precision than the individual variant callers in all target region sizes, ranging from a few hundred kilobases to whole exomes. We also demonstrated that VariantMetaCaller supports a quantitative, precision based filtering of variants under wider conditions. Specifically, the computed probabilities of the variants can be used to order the variants, and for a given threshold, probabilities can be used to estimate precision. Precision then can be directly translated to the number of true called variants, or equivalently, to the number of false calls, which allows finding problem-specific balance between sensitivity and precision.ConclusionsVariantMetaCaller can be applied to small target regions and whole exomes as well, and it can be used in cases of organisms for which highly accurate variant call sets are not yet available, therefore it can be a viable alternative to hard filtering in cases where variant quality score recalibration cannot be used. VariantMetaCaller is freely available at http://bioinformatics.mit.bme.hu/VariantMetaCaller.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2050-y) contains supplementary material, which is available to authorized users.

Highlights

  • The low concordance between different variant calling methods still poses a challenge for the wide-spread application of next-generation sequencing in research and clinical practice

  • Results on simulated sequencing data We created synthetic sequencing data with known variations in the reference genome to compare the performance of previous variant calling pipelines to that of our method

  • As we show in the supplementary results, the precision of all individual variant callers was relatively high for Single nucleotide polymorphism (SNP) (> 0.99) as opposed to the precision for indels (0.6 − 0.95)

Read more

Summary

Introduction

The low concordance between different variant calling methods still poses a challenge for the wide-spread application of next-generation sequencing in research and clinical practice. To further improve the sensitivity of the pipeline, one can use multiple variant calling methods, as it is a well-known fact that different callers produce different results [1, 3,4,5,6,7]. The rationale behind this practice is that the consequence of a false negative variant call (i.e. not discovering a true variant) is usually more serious than the consequence of a false positive (i.e. unreal variant claimed to be real), especially in clinical settings. An application-specific balance between sensitivity and precision is needed

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call