Generalising better: Applying deep learning to integrate deleteriousness prediction scores for whole-exome SNV studies.

Ilia Korvigo,Nikolay Romashchenko,Mikhail Skoblov,Andrey Afanasyev

doi:10.1371/journal.pone.0192829

Abstract

Many automatic classifiers were introduced to aid inference of phenotypical effects of uncategorised nsSNVs (nonsynonymous Single Nucleotide Variations) in theoretical and medical applications. Lately, several meta-estimators have been proposed that combine different predictors, such as PolyPhen and SIFT, to integrate more information in a single score. Although many advances have been made in feature design and machine learning algorithms used, the shortage of high-quality reference data along with the bias towards intensively studied in vitro models call for improved generalisation ability in order to further increase classification accuracy and handle records with insufficient data. Since a meta-estimator basically combines different scoring systems with highly complicated nonlinear relationships, we investigated how deep learning (supervised and unsupervised), which is particularly efficient at discovering hierarchies of features, can improve classification performance. While it is believed that one should only use deep learning for high-dimensional input spaces and other models (logistic regression, support vector machines, Bayesian classifiers, etc) for simpler inputs, we still believe that the ability of neural networks to discover intricate structure in highly heterogenous datasets can aid a meta-estimator. We compare the performance with various popular predictors, many of which are recommended by the American College of Medical Genetics and Genomics (ACMG), as well as available deep learning-based predictors. Thanks to hardware acceleration we were able to use a computationally expensive genetic algorithm to stochastically optimise hyper-parameters over many generations. Overfitting was hindered by noise injection and dropout, limiting coadaptation of hidden units. Although we stress that this work was not conceived as a tool comparison, but rather an exploration of the possibilities of deep learning application in ensemble scores, our results show that even relatively simple modern neural networks can significantly improve both prediction accuracy and coverage. We provide open-access to our finest model via the web-site: http://score.generesearch.ru/services/badmut/.

Highlights

Single amino-acid variation is a valuable source of information that can help us understand the fundamental features of protein evolution and function as well as uncover causative variants behind inherent health conditions and develop custom treatment strategies to maximise therapeutic efficiency
Since a metaestimator basically combines different scoring systems with highly complicated nonlinear relationships, we investigated how deep learning, which is efficient at discovering hierarchies of features, can improve classification performance
We stress that this work was not conceived as a tool comparison, but rather an exploration of the possibilities of deep learning application in ensemble scores, our results show that even relatively simple modern neural networks can significantly improve both prediction accuracy and coverage

Summary

Introduction

Single amino-acid variation (caused by nonsynonymous single nucleotide substitutions— nsSNVs) is a valuable source of information that can help us understand the fundamental features of protein evolution and function as well as uncover causative variants behind inherent health conditions and develop custom treatment strategies to maximise therapeutic efficiency. In order to make predictions these tools encode variants using multiple quantitative and qualitative features, e.g. sequence homology [2], protein structure [3, 4] and evolutionary conservation [5, 6]. This diversity of scoring tools has led to the creation of dbNSFP [7,8,9], a regularly updated specialised database that accumulates predictions of various scores alongside genomic features for most of the possible variants in the human exome

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Mar 14, 2018
Citations: 15	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Generalising better: Applying deep learning to integrate deleteriousness prediction scores for whole-exome SNV studies.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

DNA-based screening and population health: a points to consider statement for programs and sponsoring organizations from the American College of Medical Genetics and Genomics (ACMG)
Michael F Murray ... Michael S Watson
Genetics in Medicine | VOL. 23
Michael F Murray, et. al.Michael F Murray ... Michael S Watson
01 Jun 2021
Genetics in Medicine | VOL. 23

ACMG SF v3.0 list for reporting of secondary findings in clinical exome and genome sequencing: a policy statement of the American College of Medical Genetics and Genomics (ACMG)
David T Miller ... Christa Lese Martin
Genetics in Medicine | VOL. 23
David T Miller, et. al.David T Miller ... Christa Lese Martin
01 Aug 2021
Genetics in Medicine | VOL. 23

Updated recommendations for CFTR carrier screening: A position statement of the American College of Medical Genetics and Genomics (ACMG)
Joshua L Deignan ... Catherine Ziats
Genetics in Medicine | VOL. 25
Joshua L Deignan, et. al.Joshua L Deignan ... Catherine Ziats
13 Jun 2023
Genetics in Medicine | VOL. 25

The use of fetal exome sequencing in prenatal diagnosis: a points to consider document of the American College of Medical Genetics and Genomics (ACMG)
Kristin G Monaghan ... Nancy C Rose
Genetics in Medicine | VOL. 22
Kristin G Monaghan, et. al.Kristin G Monaghan ... Nancy C Rose
01 Apr 2020
Genetics in Medicine | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Generalising better: Applying deep learning to integrate deleteriousness prediction scores for whole-exome SNV studies.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one