Abstract
The protein universe consists of a continuum of structures ranging from full order to complete disorder. As the structured part of the proteome has been intensively studied, stably folded proteins are increasingly well documented and understood. However, proteins that are fully, or in large part, disordered are much less well characterized. Here we collected NMR chemical shifts in a small database for 117 protein sequences that are known to contain disorder. We demonstrate that NMR chemical shift data can be brought to bear as an exquisite judge of protein disorder at the residue level, and help in validation. With the help of secondary chemical shift analysis we demonstrate that the proteins in the database span the full spectrum of disorder, but still, largely segregate into two classes; disordered with small segments of order scattered along the sequence, and structured with small segments of disorder inserted between the different structured regions. A detailed analysis reveals that the distribution of order/disorder along the sequence shows a complex and asymmetric distribution, that is highly protein-dependent. Access to ratified training data further suggests an avenue to improving prediction of disorder from sequence.
Highlights
A systematic re-examination of the protein structure–function paradigm is required to accommodate intrinsically unfolded/disordered proteins (IDPs)
Following our bottom-up approach to include all available data for potentially disordered proteins in the BMRB database, it was intended that the CheZOD database should be representative for the full range of intrinsically disordered proteins
The CheZOD database is relatively small compared to other databases with 117 proteins, but one can imagine that the CheZOD database could be expanded even further in the future by including searches for entries in other databases such as MobiDB (Potenza et al, 2014), IDEAL (Fukuchi et al, 2014), and D2P2 (Oates et al, 2013)
Summary
A systematic re-examination of the protein structure–function paradigm is required to accommodate intrinsically unfolded/disordered proteins (IDPs). There are two major reasons for this reappraisal: (1) the results of bioinformatics analyses of the genomic codes for protein amino acid sequences, and (2) the accumulation of experimental evidence for the existence of a rather large number of protein domains and even entire proteins, lacking ordered structure under physiological conditions (Dyson and Wright, 2001, 2004; Uversky, 2003; Vucetic et al, 2003). Disorder maps to proteins with important functions, such as signal transduction and control of transcription, and IDPs are involved in all major classes of disease (Gregersen et al, 2006; Uversky et al, 2008). For the Diversity in Disorder biological relevance of disorder for the selected proteins, the interested reader is referred to the original papers that described the proteins considered
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.