Abstract
BackgroundA significant number of proteins have been shown to be intrinsically disordered, meaning that they lack a fixed 3 D structure or contain regions that do not posses a well defined 3 D structure. It has also been proven that a protein's disorder content is related to its function. We have performed an exhaustive analysis and comparison of the disorder content of proteins from prokaryotic organisms (i.e., superkingdoms Archaea and Bacteria) with respect to functional categories they belong to, i.e., Clusters of Orthologous Groups of proteins (COGs) and groups of COGs-Cellular processes (Cp), Information storage and processing (Isp), Metabolism (Me) and Poorly characterized (Pc).We also analyzed the disorder content of proteins with respect to various genomic, metabolic and ecological characteristics of the organism they belong to. We used correlations and association rule mining in order to identify the most confident associations between specific modalities of the characteristics considered and disorder content.ResultsBacteria are shown to have a somewhat higher level of protein disorder than archaea, except for proteins in the Me functional group. It is demonstrated that the Isp and Cp functional groups in particular (L-repair function and N-cell motility and secretion COGs of proteins in specific) possess the highest disorder content, while Me proteins, in general, posses the lowest. Disorder fractions have been confirmed to have the lowest level for the so-called order-promoting amino acids and the highest level for the so-called disorder promoters.For each pair of organism characteristics, specific modalities are identified with the maximum disorder proteins in the corresponding organisms, e.g., high genome size-high GC content organisms, facultative anaerobic-low GC content organisms, aerobic-high genome size organisms, etc. Maximum disorder in archaea is observed for high GC content-low genome size organisms, high GC content-facultative anaerobic or aquatic or mesophilic organisms, etc. Maximum disorder in bacteria is observed for high GC content-high genome size organisms, high genome size-aerobic organisms, etc.Some of the most reliable association rules mined establish relationships between high GC content and high protein disorder, medium GC content and both medium and low protein disorder, anaerobic organisms and medium protein disorder, Gammaproteobacteria and low protein disorder, etc. A web site Prokaryote Disorder Database has been designed and implemented at the address http://bioinfo.matf.bg.ac.rs/disorder, which contains complete results of the analysis of protein disorder performed for 296 prokaryotic completely sequenced genomes.ConclusionsExhaustive disorder analysis has been performed by functional classes of proteins, for a larger dataset of prokaryotic organisms than previously done. Results obtained are well correlated to those previously published, with some extension in the range of disorder level and clear distinction between functional classes of proteins. Wide correlation and association analysis between protein disorder and genomic and ecological characteristics has been performed for the first time. The results obtained give insight into multi-relationships among the characteristics and protein disorder. Such analysis provides for better understanding of the evolutionary process and may be useful for taxon determination. The main drawback of the approach is the fact that the disorder considered has been predicted and not experimentally established.
Highlights
A significant number of proteins have been shown to be intrinsically disordered, meaning that they lack a fixed 3 D structure or contain regions that do not posses a well defined 3 D structure
As a result of a growing number of experimental data on protein structure determination, it became evident that a significant number of proteins, under physiological conditions, do not possess a well defined 3 D ordered structure
They exhibit a variety of conformational isomers in which the atom positions and the polypeptide backbone of the Ramachandran plot vary over time, with no specific equilibrium values, typically involving non-cooperative conformational changes [1]. They are known by different names, such as: “natively disordered/ unfolded/denatured proteins”, or “intrinsically disordered/unfolded/unstructured proteins”, or “rheomorphic proteins”, with the most frequently used term being “intrinsically disordered proteins (IDP)” and are recently reviewed in detail in [2,3,4,5,6,7,8,9,10,11,12]
Summary
A significant number of proteins have been shown to be intrinsically disordered, meaning that they lack a fixed 3 D structure or contain regions that do not posses a well defined 3 D structure. As a result of a growing number of experimental data on protein structure determination, it became evident that a significant number of proteins, under physiological conditions, do not possess a well defined 3 D ordered structure. They exhibit a variety of conformational isomers in which the atom positions and the polypeptide backbone ( and ψ torsion angles) of the Ramachandran plot vary over time, with no specific equilibrium values, typically involving non-cooperative conformational changes [1]. In this paper we will use the term “disordered proteins” (DP) They may be completely disordered, or may be composed of both ordered and disordered regions of various lengths. Fully DPs represent a special group of proteins of various lengths
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.