Abstract

All proteomes contain both proteins and polypeptide segments that don’t form a defined three-dimensional structure yet are biologically active—called intrinsically disordered proteins and regions (IDPs and IDRs). Most of these IDPs/IDRs lack useful functional annotation limiting our understanding of their importance for organism fitness. Here we characterized IDRs using protein sequence annotations of functional sites and regions available in the UniProt knowledgebase (“UniProt features”: active site, ligand-binding pocket, regions mediating protein-protein interactions, etc.). By measuring the statistical enrichment of twenty-five UniProt features in 981 IDRs of 561 human proteins, we identified eight features that are commonly located in IDRs. We then collected the genetic variant data from the general population and patient-based databases and evaluated the prevalence of population and pathogenic variations in IDPs/IDRs. We observed that some IDRs tolerate 2 to 12-times more single amino acid-substituting missense mutations than synonymous changes in the general population. However, we also found that 37% of all germline pathogenic mutations are located in disordered regions of 96 proteins. Based on the observed-to-expected frequency of mutations, we categorized 34 IDRs in 20 proteins (DDX3X, KIT, RB1, etc.) as intolerant to mutation. Finally, using statistical analysis and a machine learning approach, we demonstrate that mutation-intolerant IDRs carry a distinct signature of functional features. Our study presents a novel approach to assign functional importance to IDRs by leveraging the wealth of available genetic data, which will aid in a deeper understating of the role of IDRs in biological processes and disease mechanisms.

Highlights

  • In contrast to the standard protein structure-function paradigm, it is recognized that many proteins, in their entirety or partly in regions, lack a defined three-dimensional (3D) structure under physiological conditions, but still carry out a wide range of cellular functions [1,2]

  • Informed by human genetic diversity, we identified the Intrinsically disordered regions (IDRs) that are more frequently mutated in patients than in relatively healthy individuals, and further show that they carry a set of characteristic functional features

  • This study has been performed on human intrinsically disordered proteins (IDPs) that were annotated with disorder information, i.e., whether a residue/region is disordered and its category in the DisProt database [36], and residue position-specific “UniProt feature” information, indicating sites of biological interest in proteins, in the UniProt database [28]

Read more

Summary

Introduction

In contrast to the standard protein structure-function paradigm, it is recognized that many proteins, in their entirety or partly in regions, lack a defined three-dimensional (3D) structure under physiological conditions, but still carry out a wide range of cellular functions [1,2]. These biologically active, dynamic proteins and regions in proteins are known as intrinsically disordered proteins (IDPs) or regions (IDRs) [3]. In light of the growing success of predictive methods in determining the commonness of IDRs and in detecting IDRs and their functions, a biennial experiment inspired by the critical assessment of protein structure prediction (CASP) for the benchmarking of intrinsic disorder (CAID) has been established [16]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call