Prioritizing genomic variants pathogenicity via DNA, RNA, and protein-level features based on extreme gradient boosting.

Maolin Ding,Huiying Zhao,Yuedong Yang,Ken Chen

doi:10.1007/s00439-024-02667-0

Abstract

Genetic diseases are mostly implicated with genetic variants, including missense, synonymous, non-sense, and copy number variants. These different kinds of variants are indicated to affect phenotypes in various ways from previous studies. It remains essential but challenging to understand the functional consequences of these genetic variants, especially the noncoding ones, due to the lack of corresponding annotations. While many computational methods have been proposed to identify the risk variants. Most of them have only curated DNA-level and protein-level annotations to predict the pathogenicity of the variants, and others have been restricted to missense variants exclusively. In this study, we have curated DNA-, RNA-, and protein-level features to discriminate disease-causing variants in both coding and noncoding regions, where the features of protein sequences and protein structures have been shown essential for analyzing missense variants in coding regions while the features related to RNA-splicing and RBP binding are significant for variants in noncoding regions and synonymous variants in coding regions. Through the integration of these features, we have formulated the Multi-level feature Genomic Variants Predictor (ML-GVP) using the gradient boosting tree. The method has been trained on more than 400,000 variants in the Sherloc-training set from the 6th critical assessment of genome interpretation with superior performance. The method is one of the two best-performing predictors on the blind test in the Sherloc assessment, and is further confirmed by another independent test dataset of de novo variants.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Prioritizing genomic variants pathogenicity via DNA, RNA, and protein-level features based on extreme gradient boosting.

Abstract

Talk to us

Similar Papers

More From: Human genetics

Lead the way for us

Similar Papers

Basic concepts and potential applications of genetics and genomics for cardiovascular and stroke clinicians: a scientific statement from the American Heart Association.
Kiran Musunuru ... Caroline S Fox
Circulation: Cardiovascular Genetics | VOL. 8
Kiran Musunuru, et. al.Kiran Musunuru ... Caroline S Fox
05 Jan 2015
Circulation: Cardiovascular Genetics | VOL. 8

Cystic Fibrosis Transmembrane Conductance Regulator Gene Variations in Coding and Noncoding Regions in Congenital Bilateral Absence of the Vas Deferens Dependent Infertility
Semire Uzun Göçmen ... Sina Gökçe
Biophysical Journal | VOL. 118
Semire Uzun Göçmen, et. al.Semire Uzun Göçmen ... Sina Gökçe
01 Feb 2020
Biophysical Journal | VOL. 118

Blood disease–causing and –suppressing transcriptional enhancers: general principles and GATA2 mechanisms
Emery H Bresnick ... Kirby D Johnson
Blood Advances | VOL. 3
Emery H Bresnick, et. al.Emery H Bresnick ... Kirby D Johnson
09 Jul 2019
Blood Advances | VOL. 3

Long-Term Balancing Selection at the West Nile Virus Resistance Gene, Oas1b, Maintains Transspecific Polymorphisms in the House Mouse
W Ferguson ... J Gallo
Molecular Biology and Evolution | VOL. 25
W Ferguson, et. al.W Ferguson ... J Gallo
23 Apr 2008
Molecular Biology and Evolution | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Prioritizing genomic variants pathogenicity via DNA, RNA, and protein-level features based on extreme gradient boosting.

Abstract

Talk to us

Similar Papers

More From: Human genetics