Abstract

BackgroundComplete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. However, determining protein-coding genes for most new genomes is almost completely performed by inference using computational predictions with significant documented error rates (> 15%). Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function.ResultsWe experimentally annotated the bacterial pathogen Salmonella Typhimurium 14028, using "shotgun" proteomics to accurately uncover the translational landscape and post-translational features. The data provide protein-level experimental validation for approximately half of the predicted protein-coding genes in Salmonella and suggest revisions to several genes that appear to have incorrectly assigned translational start sites, including a potential novel alternate start codon. Additionally, we uncovered 12 non-annotated genes missed by gene prediction programs, as well as evidence suggesting a role for one of these novel ORFs in Salmonella pathogenesis. We also characterized post-translational features in the Salmonella genome, including chemical modifications and proteolytic cleavages. We find that bacteria have a much larger and more complex repertoire of chemical modifications than previously thought including several novel modifications. Our in vivo proteolysis data identified more than 130 signal peptide and N-terminal methionine cleavage events critical for protein function.ConclusionThis work highlights several ways in which application of proteomics data can improve the quality of genome annotations to facilitate novel biological insights and provides a comprehensive proteome map of Salmonella as a resource for systems analysis.

Highlights

  • Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems

  • The genome of Salmonella Typhimurium (STM) 14028s was sequenced as described in the Methods section and is composed of two replicons: a main chromosome (4.87 Mb) and a plasmid (94 kb) with over 99% sequence homology to the Salmonella Typhimurium LT2 virulence plasmid pSLT

  • For the proteogenomic analysis described in this study we focused on chromosome-encoded genome features

Read more

Summary

Introduction

Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. Gene prediction programs provide no information on biologically important post-translational processing events critical for protein function. Many aspects of modern biological research are dependent on accurate identification of the protein-coding genes in each genome, as well as the nature of the mature functional protein products, a process commonly referred to as genome annotation. Experimental evidence is typically based on expressed RNA sequences, such as from microarray or RNA Seq experiments. These genome-centric analyses do not independently and unequivocally determine whether a predicted protein-coding gene is translated into a protein or importantly provide any reliable information on post-translational processing

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call