Abstract
Proteome-scale bioinformatics research is increasingly conducted as the number of completely sequenced genomes increases, but analysis of protein domains (PDs) usually relies on similarity in their amino acid sequences and/or three-dimensional structures. Here, we present results from a bi-clustering analysis on presence/absence data for 6,580 unique PDs in 2,134 species with a sequenced genome, thus covering a complete set of proteins, for the three superkingdoms of life, Bacteria, Archaea, and Eukarya. Our analysis revealed eight distinctive PD clusters, which, following an analysis of enrichment of Gene Ontology functions and CATH classification of protein structures, were shown to exhibit structural and functional properties that are taxa-characteristic. For examples, the largest cluster is ubiquitous in all three superkingdoms, constituting a set of 1,472 persistent domains created early in evolution and retained in living organisms and characterized by basic cellular functions and ancient structural architectures, while an Archaea and Eukarya bi-superkingdom cluster suggests its PDs may have existed in the ancestor of the two superkingdoms, and others are single superkingdom- or taxa (e.g. Fungi)-specific. These results contribute to increase our appreciation of PD diversity and our knowledge of how PDs are used in species, yielding implications on species evolution.
Highlights
Proteins are formed by modules, commonly referred to as domains, linked together in a polypeptide chain
Unlike the studies reported by Caetano-Anolles and colleagues[9] in which the abundance of protein domains (PDs) in species was coded into 21 alphabets and a parsimonious path was used to define the evolutionary history of PDs, the aim of the present work was to identify PD clusters that were defined by similarity of species usage and characterize these in terms of species-related protein function and structure
We used the 3D classifications of CATH and the functional annotations of Gene Ontology (GO)[20] to characterize PD clusters that were defined by similarities of their species usage
Summary
Proteins are formed by modules, commonly referred to as domains, linked together in a polypeptide chain. By analyzing the PD content of proteomes of various species, such studies can reveal the origin and evolutionary history of PDs5–10 and identify those that seem to be used only by certain taxa, such as Bacteria[11,12,13], that, for example, could provide useful targets for the development of drugs against microbial pathogens[14,15] These post-genomic analyses are testaments to the wealth of knowledge that can be mined by interrogating the relationship between PDs and their species usage. Our results provide new perspectives on the relationship between species usage and the structure and function of proteins
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.