PolyBoost: An enhanced genomic variant classifier using extreme gradient boosting.

Daniel J Parente

doi:10.1002/prca.201900124

Abstract

Human exome sequences contain 15,000-20,000 variants but many variants have unknown clinical impact. In silico predictive classifiers are recognized by the American College of Medical Genetics as a resource for interpreting these "variants of uncertain significance." Many in silico classifiers have been developed, of which PolyPhen-2 is highly successful and widely used. PolyPhen-2 uses a naïve Bayes model to synthesize sequence, structural and genomic information. I investigated whether predictive performance could be improved by replacing PolyPhen-2's naïve Bayes model with alternative machine learning methods. Classifiers using the PolyPhen-2 feature set were retrained using extreme gradient boosting (XGBoost), random forests, artificial neural networks, and support vector machines. Classifiers were externally validated on "pathogenic" and "benign" ClinVar variants absent from the training datasets. Software is implemented in Python and is freely available at https://github.com/djparente/polyboost and the Python Package Index (PyPI) under the BSD license. An XGBoost-based classifier-designated PolyBoost (PolyPhen-2 Booster)-improves discriminative performance and calibration relative to PolyPhen-2 in external validation on ClinVar. PolyBoost analyzes PolyPhen-2 output and can be incorporated into existing bioinformatics workflows as a post-analysis method to improve interpretation of clinical exome sequences obtained to identify monogenic disease.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

PolyBoost: An enhanced genomic variant classifier using extreme gradient boosting.

Abstract

Talk to us

Similar Papers

More From: Proteomics. Clinical applications

Lead the way for us

Journal: Proteomics. Clinical applications	Publication Date: Mar 12, 2021
Citations: 3

Similar Papers

DNA-based screening and population health: a points to consider statement for programs and sponsoring organizations from the American College of Medical Genetics and Genomics (ACMG)
Michael F Murray ... Michael S Watson
Genetics in Medicine | VOL. 23
Michael F Murray, et. al.Michael F Murray ... Michael S Watson
01 Jun 2021
Genetics in Medicine | VOL. 23

The use of fetal exome sequencing in prenatal diagnosis: a points to consider document of the American College of Medical Genetics and Genomics (ACMG)
Kristin G Monaghan ... Nancy C Rose
Genetics in Medicine | VOL. 22
Kristin G Monaghan, et. al.Kristin G Monaghan ... Nancy C Rose
01 Apr 2020
Genetics in Medicine | VOL. 22

ACMG SF v3.0 list for reporting of secondary findings in clinical exome and genome sequencing: a policy statement of the American College of Medical Genetics and Genomics (ACMG)
David T Miller ... Christa Lese Martin
Genetics in Medicine | VOL. 23
David T Miller, et. al.David T Miller ... Christa Lese Martin
01 Aug 2021
Genetics in Medicine | VOL. 23

Genomic screening of the general adult population: key concepts for assessing net benefit with systematic evidence reviews.
Anya E.R Prince ... Gail Henderson
Genetics in Medicine | VOL. 17
Anya E.R Prince, et. al.Anya E.R Prince ... Gail Henderson
18 Sep 2014
Genetics in Medicine | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PolyBoost: An enhanced genomic variant classifier using extreme gradient boosting.

Abstract

Talk to us

Similar Papers

More From: Proteomics. Clinical applications