Abstract

The combination of tandem mass spectrometry and sequence database searching is the method of choice for the identification of peptides and the mapping of proteomes. Over the last several years, the volume of data generated in proteomic studies has increased dramatically, which challenges the computational approaches previously developed for these data. Furthermore, a multitude of search engines have been developed that identify different, overlapping subsets of the sample peptides from a particular set of tandem mass spectrometry spectra. We present iProphet, the new addition to the widely used open-source suite of proteomic data analysis tools Trans-Proteomics Pipeline. Applied in tandem with PeptideProphet, it provides more accurate representation of the multilevel nature of shotgun proteomic data. iProphet combines the evidence from multiple identifications of the same peptide sequences across different spectra, experiments, precursor ion charge states, and modified states. It also allows accurate and effective integration of the results from multiple database search engines applied to the same data. The use of iProphet in the Trans-Proteomics Pipeline increases the number of correctly identified peptides at a constant false discovery rate as compared with both PeptideProphet and another state-of-the-art tool Percolator. As the main outcome, iProphet permits the calculation of accurate posterior probabilities and false discovery rate estimates at the level of sequence identical peptide identifications, which in turn leads to more accurate probability estimates at the protein level. Fully integrated with the Trans-Proteomics Pipeline, it supports all commonly used MS instruments, search engines, and computer platforms. The performance of iProphet is demonstrated on two publicly available data sets: data from a human whole cell lysate proteome profiling experiment representative of typical proteomic data sets, and from a set of Streptococcus pyogenes experiments more representative of organism-specific composite data sets.

Highlights

  • A combination of protein digestion, liquid chromatography and tandem mass spectrometry (LC-mass spectrometric (MS)/MS)1, often referred to as shotgun proteomics, has become a robust and powerful proteomics technology

  • PeptideProphet and ProteinProphet—Before introducing the extended modeling framework provided by iProphet, it is informative to briefly summarize the conventional PeptideProphet and ProteinProphet approach to the analysis of shotgun proteomic data

  • PeptideProphet takes as input all peptide to spectrum matches (PSMs) from the entire experiment

Read more

Summary

Introduction

A combination of protein digestion, liquid chromatography and tandem mass spectrometry (LC-MS/MS), often referred to as shotgun proteomics, has become a robust and powerful proteomics technology. In recent years there has been substantial progress in developing bioinformatics and statistical tools in support of shotgun proteomic data This includes the development of new and improved tandem MS (MS/MS) database search algorithms, as well as statistical data methods for estimating FDR and posterior peptide and protein probabilities (reviewed in [2, 6]). There is a growing interest in the analysis of MS/MS data using a combination of multiple search engines, with the intent to maximize the number and confidence of peptide and protein identifications This approach has become computationally feasible with the availability of faster computers, the prevalence of computing clusters, and recent emergence of cloud computing. Combining the results of multiple searches presents additional technical challenges, including the heterogeneity of search engine scores, the propagation of errors, and informatics challenge related to nonuniform data formats

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call