Abstract

A major goal in proteomics is the comprehensive and accurate description of a proteome. This task includes not only the identification of proteins in a sample, but also the accurate quantification of their abundance. Although mass spectrometry typically provides information on peptide identity and abundance in a sample, it does not directly measure the concentration of the corresponding proteins. Specifically, most mass-spectrometry-based approaches (e.g. shotgun proteomics or selected reaction monitoring) allow one to quantify peptides using chromatographic peak intensities or spectral counting information. Ultimately, based on these measurements, one wants to infer the concentrations of the corresponding proteins. Inferring properties of the proteins based on experimental peptide evidence is often a complex problem because of the ambiguity of peptide assignments and different chemical properties of the peptides that affect the observed concentrations. We present SCAMPI, a novel generic and statistically sound framework for computing protein abundance scores based on quantified peptides. In contrast to most previous approaches, our model explicitly includes information from shared peptides to improve protein quantitation, especially in eukaryotes with many homologous sequences. The model accounts for uncertainty in the input data, leading to statistical prediction intervals for the protein scores. Furthermore, peptides with extreme abundances can be reassessed and classified as either regular data points or actual outliers. We used the proposed model with several datasets and compared its performance to that of other, previously used approaches for protein quantification in bottom-up mass spectrometry.

Highlights

  • We used the proposed model with several datasets and compared its performance to that of other, previously used approaches for protein quantification in bottom-up

  • The “Results” section of this paper describes the application of SCAMPI to three datasets and compares its performance to that of some previously used protein quantification methods, namely, TOPn [2] and MaxQuant [10], which are briefly discussed below

  • The protein concentration scores computed by SCAMPI were compared with the ground truth and to the results obtained with quantification tools used by the authors of the data

Read more

Summary

Introduction

We used the proposed model with several datasets and compared its performance to that of other, previously used approaches for protein quantification in bottom-up. Protein identification remains an important topic of ongoing research, the focus has moved to quantification in recent years. Is it important to know which proteins are present in a sample, but the abundance of these molecules is of major interest. Most methods for analyzing mass spectrometry (MS)based proteomics data rely on a sequential approach: first identification and quantification of the peptides and proteins in a sample [5, 6]. The inference of protein abundance relies on quantitative information about the corresponding peptides, usually either chromatographic peak intensities (ion-currents) or spectral counts (the number of recorded MS/MS spectra). The protein inference problem [12] consists of deciding which protein sequences are present in a sample based on the set of identified peptide sequences. It has been ad-

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call