Abstract

BackgroundModern biomedical research depends on a complete and accurate proteome. With the widespread adoption of new sequencing technologies, genome sequences are generated at a near exponential rate, diminishing the time and effort that can be invested in genome annotation. The resulting gene set contains numerous errors in even the most basic form of annotation: the primary structure of the proteins.ResultsThe application of experimental proteomics data to genome annotation, called proteogenomics, can quickly and efficiently discover misannotations, yielding a more accurate and complete genome annotation. We present a comprehensive proteogenomic analysis of the plague bacterium, Yersinia pestis KIM. We discover non-annotated genes, correct protein boundaries, remove spuriously annotated ORFs, and make major advances towards accurate identification of signal peptides. Finally, we apply our data to 21 other Yersinia genomes, correcting and enhancing their annotations.ConclusionsIn total, 141 gene models were altered and have been updated in RefSeq and Genbank, which can be accessed seamlessly through any NCBI tool (e.g. blast) or downloaded directly. Along with the improved gene models we discover new, more accurate means of identifying signal peptides in proteomics data.

Highlights

  • Modern biomedical research depends on a complete and accurate proteome

  • Correcting Annotation Errors Following the data path outlined in the Methods (Figure 1), ~15 million MS/MS spectra from Yersinia pestis KIM were searched by Inspect and PepNovo against the six-frame translation of the genome

  • Confident peptide/spectrum matches were mapped onto the genome sequence, and used to infer annotation improvements

Read more

Summary

Introduction

Modern biomedical research depends on a complete and accurate proteome. With the widespread adoption of new sequencing technologies, genome sequences are generated at a near exponential rate, diminishing the time and effort that can be invested in genome annotation. The resulting gene set contains numerous errors in even the most basic form of annotation: the primary structure of the proteins. A Gram-negative bacterium, is the causative agent of the bubonic and pneumonic plague. Seven Y. pestis genomes have been sequenced to completion, along with five other Yersinia sequences. Numerous other Yersinia have been sequenced to draft quality. Genome annotation is often divided into two sequential phases, finding genes and assigning function. Most prokaryotic genome annotation pipelines consist of automated gene finding, corroborated by limited homology comparisons. As such they lack any experimental validation of primary structure. An accurate primary structure implies finding the correct start/ stop of the gene, which may be erroneously predicted

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call