Abstract

Next generation sequencing technologies are providing increasing amounts of sequencing data, paving the way for improvements in clinical genetics and precision medicine. The interpretation of the observed genomic variants in the light of their phenotypic effects is thus emerging as a crucial task to solve in order to advance our understanding of how exomic variants affect proteins and how the proteins’ functional changes affect human health. Since the experimental evaluation of the effects of every observed variant is unfeasible, Bioinformatics methods are being developed to address this challenge in-silico, by predicting the impact of millions of variants, thus providing insight into the deleteriousness landscape of entire proteomes. Here we show the feasibility of this approach by using the recently developed DEOGEN2 variant-effect predictor to perform the largest in-silico mutagenesis scan to date. We computed the deleteriousness score of 170 million variants over 15000 human proteins and we analysed the results, investigating how the predicted deleteriousness landscape of the proteins relates to known functionally and structurally relevant protein regions and biophysical properties. Moreover, we qualitatively validated our results by comparing them with two mutagenesis studies targeting two specific proteins, showing the consistency of DEOGEN2 predictions with respect to experimental data.

Highlights

  • The next-generation sequencing revolution is providing an unprecedented amount of human sequence variation data[1], allowing bioinformatics to address the challenging task of the in-silico interpretation of the phenotypic effects of genetic variants[2,3]

  • We selected only proteins for which high-quality annotations are available, as our goal is to investigate the relationship between deleteriousness predictions and biological and molecular aspects of the cell as annotated in UniprotKB327

  • In this study we performed the largest in-silico mutagenesis analysis to date that we are aware of, analysing nearly 170 million Single Amino-acid Variants (SAVs) predictions computed with our DEOGEN2 method

Read more

Summary

Introduction

The major limitation of this method is that it does not give a complete picture of which mutations to which amino acids affect protein function the most (or least): ideally, the experimental mutagenesis would involve the substitution of each residue with all other 19 amino-acids. An accurate and fast computational tool can predict the likely impact of millions of variants at an extremely low cost Such in-silico mutagenesis studies have already been performed for specific proteins[13,14,17,19], suggesting that the interpretation of these results may help i) targeting further in-vitro experimental verification and ii) providing actual insight into the deleteriousness landscape of the protein under investigation, for example by highlighting putative functionally or structurally relevant sites[17]. For the Melanocortin receptor, we blind-tested the DEOGEN2 predictions on 159 experimentally annotated variants extracted from[17], showing that our predictor is able to distinguish between neutral SAVs and ones with functional consequences

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call