Abstract
A major goal in proteomics is the comprehensive and accurate description of a proteome. This task includes not only the identification of proteins in a sample, but also the accurate quantification of their abundance. Although mass spectrometry typically provides information on peptide identity and abundance in a sample, it does not directly measure the concentration of the corresponding proteins. Specifically, most mass-spectrometry-based approaches (e.g. shotgun proteomics or selected reaction monitoring) allow one to quantify peptides using chromatographic peak intensities or spectral counting information. Ultimately, based on these measurements, one wants to infer the concentrations of the corresponding proteins. Inferring properties of the proteins based on experimental peptide evidence is often a complex problem because of the ambiguity of peptide assignments and different chemical properties of the peptides that affect the observed concentrations. We present SCAMPI, a novel generic and statistically sound framework for computing protein abundance scores based on quantified peptides. In contrast to most previous approaches, our model explicitly includes information from shared peptides to improve protein quantitation, especially in eukaryotes with many homologous sequences. The model accounts for uncertainty in the input data, leading to statistical prediction intervals for the protein scores. Furthermore, peptides with extreme abundances can be reassessed and classified as either regular data points or actual outliers. We used the proposed model with several datasets and compared its performance to that of other, previously used approaches for protein quantification in bottom-up mass spectrometry.
Highlights
We used the proposed model with several datasets and compared its performance to that of other, previously used approaches for protein quantification in bottom-up
The “Results” section of this paper describes the application of SCAMPI to three datasets and compares its performance to that of some previously used protein quantification methods, namely, TOPn [2] and MaxQuant [10], which are briefly discussed below
The protein concentration scores computed by SCAMPI were compared with the ground truth and to the results obtained with quantification tools used by the authors of the data
Summary
We used the proposed model with several datasets and compared its performance to that of other, previously used approaches for protein quantification in bottom-up. Protein identification remains an important topic of ongoing research, the focus has moved to quantification in recent years. Is it important to know which proteins are present in a sample, but the abundance of these molecules is of major interest. Most methods for analyzing mass spectrometry (MS)based proteomics data rely on a sequential approach: first identification and quantification of the peptides and proteins in a sample [5, 6]. The inference of protein abundance relies on quantitative information about the corresponding peptides, usually either chromatographic peak intensities (ion-currents) or spectral counts (the number of recorded MS/MS spectra). The protein inference problem [12] consists of deciding which protein sequences are present in a sample based on the set of identified peptide sequences. It has been ad-
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.