Abstract

The human genome harbors a variety of genetic variations. Single-nucleotide changes that alter amino acids in protein-coding regions are one of the major causes of human phenotypic variation and diseases. These single-amino acid variations (SAVs) are routinely found in whole genome and exome sequencing. Evaluating the functional impact of such genomic alterations is crucial for diagnosis of genetic disorders. We developed DeepSAV, a deep-learning convolutional neural network to differentiate disease-causing and benign SAVs based on a variety of protein sequence, structural and functional properties. Our method outperforms most stand-alone programs, and the version incorporating population and gene-level information (DeepSAV+PG) has similar predictive power as some of the best available. We transformed DeepSAV scores of rare SAVs in the human population into a quantity termed “mutation severity measure” for each human protein-coding gene. It reflects a gene's tolerance to deleterious missense mutations and serves as a useful tool to study gene-disease associations. Genes implicated in cancer, autism, and viral interaction are found by this measure as intolerant to mutations, while genes associated with a number of other diseases are scored as tolerant. Among known disease-associated genes, those that are mutation-intolerant are likely to function in development and signal transduction pathways, while those that are mutation-tolerant tend to encode metabolic and mitochondrial proteins.

Highlights

  • Genetic variations are major determinants of human diseases and phenotypes [1]

  • Human genetic variations in various forms are constantly found in whole genome and exome sequencing of general population and patients

  • It remains a challenging task to assess the functional impact of these variations

Read more

Summary

Introduction

Accelerating pace of large-scale sequencing projects on genomes and exomes has greatly expanded the landscape of human genetic variations. Comprehensive analysis of genetic variations, especially those found in and near the exons of protein-coding genes [3], may shed light on gene-disease relationships and provide insight into the mechanisms of diseases and variations in phenotypes [4]. The increasing number of sequenced human genomes and exomes from the general population would enhance the statistical power of such analyses [5]. Most notably SNVs, were first documented, more rare genetic variations (e.g., those with minor allele frequency (MAF) less than 0.0001) at the individual level have been identified in large-scale sequencing projects of the general population [5] as well as patients with certain diseases such as cancer [12] and intellectual disability [13]. Disease gene prioritization and disease-causing variation discovery are still difficult [19, 20]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.