VEF: a variant filtering tool based on ensemble methods.

Chuanyi Zhang,Idoia Ochoa

doi:10.1093/bioinformatics/btz952

Chuanyi Zhang, Idoia Ochoa

Open Access

https://doi.org/10.1093/bioinformatics/btz952

Copy DOI

Abstract

Variants identified by current genomic analysis pipelines contain many incorrectly called variants. These can be potentially eliminated by applying state-of-the-art filtering tools, such as Variant Quality Score Recalibration (VQSR) or Hard Filtering (HF). However, these methods are very user-dependent and fail to run in some cases. We propose VEF, a variant filtering tool based on decision tree ensemble methods that overcomes the main drawbacks of VQSR and HF. Contrary to these methods, we treat filtering as a supervised learning problem, using variant call data with known 'true' variants, i.e. gold standard, for training. Once trained, VEF can be directly applied to filter the variants contained in a given Variants Call Format (VCF) file (we consider training and testing VCF files generated with the same tools, as we assume they will share feature characteristics). For the analysis, we used whole genome sequencing (WGS) Human datasets for which the gold standards are available. We show on these data that the proposed filtering tool VEF consistently outperforms VQSR and HF. In addition, we show that VEF generalizes well even when some features have missing values, when the training and testing datasets differ in coverage, and when sequencing pipelines other than GATK are used. Finally, since the training needs to be performed only once, there is a significant saving in running time when compared with VQSR (4 versus 50 min approximately for filtering the single nucleotide polymorphisms of a WGS Human sample). Code and scripts available at: github.com/ChuanyiZ/vef. Supplementary data are available at Bioinformatics online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics (Oxford, England)	Publication Date: Dec 24, 2019
Citations: 4	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

VEF: a variant filtering tool based on ensemble methods.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics (Oxford, England)

Lead the way for us

Similar Papers

SU26 - A WHOLE GENOME SEQUENCING STUDY IDENTIFIES A RARE VARIANT IN ANK3 THAT MAY CONTRIBUTE TO BIPOLAR DISORDER
Joanna Biernacka ... Mark Frye
European Neuropsychopharmacology | VOL. 29
Joanna Biernacka, et. al.Joanna Biernacka ... Mark Frye
01 Jan 2019
European Neuropsychopharmacology | VOL. 29

Association between a Polygenic Risk Score for Multiple Myeloma Risk and Overall Survival
...
Blood | VOL. 134
, et. al. ...
13 Nov 2019
Blood | VOL. 134

The MAGMA pipeline for comprehensive genomic analyses of clinical Mycobacterium tuberculosis samples.
Tim H. Heupink ... Lennert Verboven
PLoS computational biology | VOL. 19
Tim H. Heupink, et. al.Tim H. Heupink ... Lennert Verboven
29 Nov 2023
PLoS computational biology | VOL. 19

Analysis workflow for the identification allelic variants associated with rare disorders using whole genome sequencing approach
V Maselli ... E Stupka
EMBnet.journal | VOL. 18
V Maselli, et. al.V Maselli ... E Stupka
29 Apr 2012
EMBnet.journal | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

VEF: a variant filtering tool based on ensemble methods.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics (Oxford, England)