Generalizable characteristics of false-positive bacterial variant calls.

Stephen J Bush

doi:10.1099/mgen.0.000615

Abstract

Minimizing false positives is a critical issue when variant calling as no method is without error. It is common practice to post-process a variant-call file (VCF) using hard filter criteria intended to discriminate true-positive (TP) from false-positive (FP) calls. These are applied on the simple principle that certain characteristics are disproportionately represented among the set of FP calls and that a user-chosen threshold can maximize the number detected. To provide guidance on this issue, this study empirically characterized all false SNP and indel calls made using real Illumina sequencing data from six disparate species and 166 variant-calling pipelines (the combination of 14 read aligners with up to 13 different variant callers, plus four ‘all-in-one’ pipelines). We did not seek to optimize filter thresholds but instead to draw attention to those filters of greatest efficacy and the pipelines to which they may most usefully be applied. In this respect, this study acts as a coda to our previous benchmarking evaluation of bacterial variant callers, and provides general recommendations for effective practice. The results suggest that, of the pipelines analysed in this study, the most straightforward way of minimizing false positives would simply be to use Snippy. We also find that a disproportionate number of false calls, irrespective of the variant-calling pipeline, are located in the vicinity of indels, and highlight this as an issue for future development.

Highlights

Minimizing false positives is a critical issue when variant calling, when the presence of a given variant can inform a clinical decision
Machine-learning approaches to bacterial true-p ositive (TP)/false-p ositive (FP) classification, which could obviate this need for hard filters, are not yet widely available due to the lack of truth sets on which they may be trained
The aim of this study was to identify which positional characteristics – that is, statistics recorded for each position, such as read depth – were disproportionately associated with bacterial FP calls and to produce generalizable recommendations for hard filters broadly applicable across a range of datasets

Summary

Introduction

Minimizing false positives is a critical issue when variant calling, when the presence of a given variant can inform a clinical decision (for instance, when diagnosing disease [2] or disease susceptibility [3], or genotyping bacterial isolates [4]). Neither circumstance is uncommon when variant calling from bacterial sequencing data. It is routine practice to post-process variant-c all files (VCFs) using hard filter criteria intended to discriminate false-p ositive (FP) from true-positive (TP) calls [11,12,13,14,15]. Hard filters apply the simple principle that certain characteristics are disproportionately represented among the set of false-positive calls and that an empirically determined threshold can maximize the number detected. Machine-learning approaches to bacterial TP/FP classification, which could obviate this need for hard filters, are not yet widely available due to the lack of truth sets on which they may be trained

Objectives

Methods

Results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Microbial genomics	Publication Date: Aug 4, 2021
Citations: 18	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Generalizable characteristics of false-positive bacterial variant calls.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Microbial genomics

Lead the way for us

Similar Papers

Multiple Variant Calling Pipelines in Wheat Whole Exome Sequencing.
H Cagirici ... Hikmet Budak
International journal of molecular sciences | VOL. 22
H Cagirici, et. al.H Cagirici ... Hikmet Budak
27 Sep 2021
International journal of molecular sciences | VOL. 22

ReliableGenome: annotation of genomic regions with high/low variant calling concordance.
Niko Popitsch ... Jenny C Taylor
Bioinformatics | VOL. 33
Niko Popitsch, et. al.Niko Popitsch ... Jenny C Taylor
07 Sep 2016
Bioinformatics | VOL. 33

TarSVM: Improving the accuracy of variant calls derived from microfluidic PCR-based targeted next generation sequencing using a support vector machine.
Christopher E Gillies ... Ali Gharavi
BMC Bioinformatics | VOL. 17
Christopher E Gillies, et. al.Christopher E Gillies ... Ali Gharavi
10 Jun 2016
BMC Bioinformatics | VOL. 17

Abstract 1077: Use of the SVClassify algorithm to classify pediatric solid tumor translocation variant calls as likely true or false positives
Jo Lynne Harenza ... Justin Zook
Cancer Research | VOL. 75
Jo Lynne Harenza, et. al.Jo Lynne Harenza ... Justin Zook
01 Aug 2015
Cancer Research | VOL. 75

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Generalizable characteristics of false-positive bacterial variant calls.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Microbial genomics