Lost and Found: Re-searching and Re-scoring Proteomics Data Aids Genome Annotation and Improves Proteome Coverage.

Patrick Willems,Petra Van Damme,Igor Fijalkowski

doi:10.1128/msystems.00833-20

Abstract

Prokaryotic genome annotation is heavily dependent on automated gene annotation pipelines that are prone to propagate errors and underestimate genome complexity. We describe an optimized proteogenomic workflow that uses ribosome profiling (ribo-seq) and proteomic data for Salmonella enterica serovar Typhimurium to identify unannotated proteins or alternative protein forms. This data analysis encompasses the searching of cofragmenting peptides and postprocessing with extended peptide-to-spectrum quality features, including comparison to predicted fragment ion intensities. When this strategy is applied, an enhanced proteome depth is achieved, as well as greater confidence for unannotated peptide hits. We demonstrate the general applicability of our pipeline by reanalyzing public Deinococcus radiodurans data sets. Taken together, our results show that systematic reanalysis using available prokaryotic (proteome) data sets holds great promise to assist in experimentally based genome annotation.IMPORTANCE Delineation of open reading frames (ORFs) causes persistent inconsistencies in prokaryote genome annotation. We demonstrate that by advanced (re)analysis of omics data, a higher proteome coverage and sensitive detection of unannotated ORFs can be achieved, which can be exploited for conditional bacterial genome (re)annotation, which is especially relevant in view of annotating the wealth of sequenced prokaryotic genomes obtained in recent years.

Highlights

Prokaryotic genome annotation is heavily dependent on automated gene annotation pipelines that are prone to propagate errors and underestimate genome complexity
Since we were striving to search the full complement of possible genomic open reading frames (ORFs), all theoretical ORFs with a minimal length of 30 nucleotides and initiated from canonical ATG or near-cognate GTG and TTG start codons of the S
We applied an optimized proteogenomics workflow for bacteria to identify unannotated protein-coding ORFs and correct existing annotations

Summary

Introduction

Prokaryotic genome annotation is heavily dependent on automated gene annotation pipelines that are prone to propagate errors and underestimate genome complexity. We describe an optimized proteogenomic workflow that uses ribosome profiling (ribo-seq) and proteomic data for Salmonella enterica serovar Typhimurium to identify unannotated proteins or alternative protein forms This data analysis encompasses the searching of cofragmenting peptides and postprocessing with extended peptide-to-spectrum quality features, including comparison to predicted fragment ion intensities. In the widely adopted NCBI prokaryotic genome annotation pipeline, protein start site annotation, sequencing errors giving rise to interrupted genes, and the delineation of open reading frames (ORFs) based on homology or ab initio predictions remain persistent problems [5]. Comparing these predicted fragment ion intensities with those matched fragment ions aids in discriminating correct PSMs by machine learning [20,21,22] As such, these fragment intensity correlation features are especially useful for attaining a higher confidence for novel (i.e., database unannotated) peptide identifications [23, 24]. We further elaborate how (ribo)proteogenomics is instrumental in reannotating ORFs, the discovery of novel ORFs across bacteria, and genome annotation in general

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: mSystems	Publication Date: Oct 27, 2020
Citations: 15	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Lost and Found: Re-searching and Re-scoring Proteomics Data Aids Genome Annotation and Improves Proteome Coverage.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: mSystems

Lead the way for us

Similar Papers

REPARATION: ribosome profiling assisted (re-)annotation of bacterial genomes.
Elvis Ndah ... Eivind Valen
Nucleic Acids Research | VOL. 45
Elvis Ndah, et. al.Elvis Ndah ... Eivind Valen
31 Aug 2017
Nucleic Acids Research | VOL. 45

VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data
Elena S Peterson ... William R Cannon
BMC Genomics | VOL. 13
Elena S Peterson, et. al.Elena S Peterson ... William R Cannon
05 Apr 2012
BMC Genomics | VOL. 13

Translation Initiation Site Profiling Reveals Widespread Synthesis of Non-AUG-Initiated Protein Isoforms in Yeast.
Amy R Eisenberg ... Marko Jovanovic
Cell systems | VOL. 11
Amy R Eisenberg, et. al.Amy R Eisenberg ... Marko Jovanovic
24 Jul 2020
Cell systems | VOL. 11

A Proteogenomic Survey of the Medicago truncatula Genome
Jeremy D Volkening ... Michael R Sussman
Molecular & Cellular Proteomics | VOL. 11
Jeremy D Volkening, et. al.Jeremy D Volkening ... Michael R Sussman
01 Oct 2012
Molecular & Cellular Proteomics | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Lost and Found: Re-searching and Re-scoring Proteomics Data Aids Genome Annotation and Improves Proteome Coverage.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: mSystems