Abstract

Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations' positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants' pathogenicity in terms of the perturbed molecular mechanisms.

Highlights

  • Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acidsubstituting missense variations on protein structure and function being especially challenging

  • A plethora of disease-associated (“pathogenic”) and benign (“population”) variants has been collected in multiple databases such as Online Mendelian Inheritance in Man (OMIM) [4], Human Gene Mutation Database (HGMD) [5], ClinVar [6], Exome Aggregation Consortium (ExAC) [3], and Genome Aggregation Database [7]

  • We investigated a set of 40 3D features grouped in seven main feature categories reporting on the affected amino acid’s physicochemical properties, structural context (e.g., α-helix, β-sheet, participation in hydrogen bonds), and their role in protein activity (i.e., “functional features,” such as their involvement in an enzyme’s active site, ligand binding pocket, cellular signaling, etc.)

Read more

Summary

Introduction

Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acidsubstituting missense variations on protein structure and function being especially challenging. Genetic screening is increasingly applied in clinical practice, especially for the diagnosis of rare monogenic diseases and cancer, leading to the identification of a rapidly growing number of genetic variations [1, 2] Most of these are missense variations, which cause an amino acid substitution upon a single nucleotide change in the protein-coding region of the genome. A plethora of disease-associated (“pathogenic”) and benign (“population”) variants has been collected in multiple databases such as Online Mendelian Inheritance in Man (OMIM) [4], Human Gene Mutation Database (HGMD) [5], ClinVar [6], Exome Aggregation Consortium (ExAC) [3], and Genome Aggregation Database (gnomAD) [7] These resources, along with an increasing amount of protein structure data available in the Protein Data Bank (PDB) [8], offer an unprecedented opportunity to characterize pathogenic and benign missense variants in the context of protein. Since a computational “black box” model generates these scores, they are not biologically interpretable; that is, it is not possible to understand why a particular missense variant is predicted to have a high or low pathogenicity score or to establish what the molecular effect of the variation will be

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.