Abstract
Archaeal and bacterial ribosomes contain more than 50 proteins, including 34 that are universally conserved in the three domains of cellular life (bacteria, archaea, and eukaryotes). Despite the high sequence conservation, annotation of ribosomal (r-) protein genes is often difficult because of their short lengths and biased sequence composition. We developed an automated computational pipeline for identification of r-protein genes and applied it to 995 completely sequenced bacterial and 87 archaeal genomes available in the RefSeq database. The pipeline employs curated seed alignments of r-proteins to run position-specific scoring matrix (PSSM)-based BLAST searches against six-frame genome translations, mitigating possible gene annotation errors. As a result of this analysis, we performed a census of prokaryotic r-protein complements, enumerated missing and paralogous r-proteins, and analyzed the distributions of ribosomal protein genes among chromosomal partitions. Phyletic patterns of bacterial and archaeal r-protein genes were mapped to phylogenetic trees reconstructed from concatenated alignments of r-proteins to reveal the history of likely multiple independent gains and losses. These alignments, available for download, can be used as search profiles to improve genome annotation of r-proteins and for further comparative genomics studies.
Highlights
The ribosome, the molecular machine for protein biosynthesis, is the hallmark of cellular life forms [1]
In addition to three or four essential, highly conserved rRNA molecules, the large (50S) and small (30S) ribosomal subunits contain over 50 distinct ribosomal (r) proteins that interact with the rRNAs and with one another
Data collection In order to derive comprehensive sets of bacterial and archaeal r-proteins, we developed a two-step procedure that is schematically shown in Figure 1
Summary
The ribosome, the molecular machine for protein biosynthesis, is the hallmark of cellular life forms [1]. 34 r-proteins are universally conserved in the three domains of cellular life (bacteria, archaea and eukaryotes); 33 r-proteins are shared between archaea and eukaryotes to the exclusion of bacteria; 23 r-proteins are bacteria-specific, 1 r-protein is archaea-specific and 11 r-proteins are eukaryotes-specific [6]. We included in our analysis three recently discovered ribosomal proteins that appear to be specific for the Sulfolobales/Desulfurococcales branch of archaea [7]. Genes encoding r-proteins are organized in genomic clusters that include several partially conserved operons and are often called ribosomal superoperons [8,9]. Systematic analysis of gene neighborhoods shows that ribosomal superoperons are the largest partially conserved gene arrays in bacterial and archaeal genomes [10,11]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.