Abstract
BackgroundThe structural and functional annotation of genomes is now heavily based on data obtained using automated pipeline systems. The key for an accurate structural annotation consists of blending similarities between closely related genomes with biochemical evidence of the genome interpretation. In this work we applied high-throughput proteogenomics to Ruegeria pomeroyi, a member of the Roseobacter clade, an abundant group of marine bacteria, as a seed for the annotation of the whole clade.ResultsA large dataset of peptides from R. pomeroyi was obtained after searching over 1.1 million MS/MS spectra against a six-frame translated genome database. We identified 2006 polypeptides, of which thirty-four were encoded by open reading frames (ORFs) that had not previously been annotated. From the pool of 'one-hit-wonders', i.e. those ORFs specified by only one peptide detected by tandem mass spectrometry, we could confirm the probable existence of five additional new genes after proving that the corresponding RNAs were transcribed. We also identified the most-N-terminal peptide of 486 polypeptides, of which sixty-four had originally been wrongly annotated.ConclusionsBy extending these re-annotations to the other thirty-six Roseobacter isolates sequenced to date (twenty different genera), we propose the correction of the assigned start codons of 1082 homologous genes in the clade. In addition, we also report the presence of novel genes within operons encoding determinants of the important tricarboxylic acid cycle, a feature that seems to be characteristic of some Roseobacter genomes. The detection of their corresponding products in large amounts raises the question of their function. Their discoveries point to a possible theory for protein evolution that will rely on high expression of orphans in bacteria: their putative poor efficiency could be counterbalanced by a higher level of expression. Our proteogenomic analysis will increase the reliability of the future annotation of marine bacterial genomes.
Highlights
The structural and functional annotation of genomes is heavily based on data obtained using automated pipeline systems
The mapping of mass spectrometrycertified peptides onto the nucleotide sequence has been applied at the primary annotation phase for at least three microorganisms: Mycoplasma mobile [13], Deinococcus deserti [14], and Thermococcus gammatolerans [15]
All of the tandem mass spectrometry (MS/MS) spectra were searched against this database using the Mascot engine, resulting in the identification of a restricted set of 4425 probable open reading frames (ORFs)
Summary
The structural and functional annotation of genomes is heavily based on data obtained using automated pipeline systems. The mapping of mass spectrometrycertified peptides onto the nucleotide sequence has been applied at the primary annotation phase for at least three microorganisms: Mycoplasma mobile [13], Deinococcus deserti [14], and Thermococcus gammatolerans [15]. Integrating both transcriptomic and proteomic complementary approaches has already been carried out for Pristionchus pacificus [16] and Geobacter sulfurreducens [17]. The main drawback of both approaches is that only a fraction of the transcriptome or the proteome can generally be observed under standard laboratory culture conditions for generalist lifestyle organisms, i.e. those with large genomes [18]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.