Abstract
Most implementations of mass spectrometry-based proteomics involve enzymatic digestion of proteins, expanding the analysis to multiple proteolytic peptides for each protein. Currently, there is no consensus of how to summarize peptides' abundances to protein concentrations, and such efforts are complicated by the fact that error control normally is applied to the identification process, and do not directly control errors linking peptide abundance measures to protein concentration. Peptides resulting from suboptimal digestion or being partially modified are not representative of the protein concentration. Without a mechanism to remove such unrepresentative peptides, their abundance adversely impacts the estimation of their protein's concentration. Here, we present a relative quantification approach, Diffacto, that applies factor analysis to extract the covariation of peptides' abundances. The method enables a weighted geometrical average summarization and automatic elimination of incoherent peptides. We demonstrate, based on a set of controlled label-free experiments using standard mixtures of proteins, that the covariation structure extracted by the factor analysis accurately reflects protein concentrations. In the 1% peptide-spectrum match-level FDR data set, as many as 11% of the peptides have abundance differences incoherent with the other peptides attributed to the same protein. If not controlled, such contradicting peptide abundance have a severe impact on protein quantifications. When adding the quantities of each protein's three most abundant peptides, we note as many as 14% of the proteins being estimated as having a negative correlation with their actual concentration differences between samples. Diffacto reduced the amount of such obviously incorrectly quantified proteins to 1.6%. Furthermore, by analyzing clinical data sets from two breast cancer studies, our method revealed the persistent proteomic signatures linked to three subtypes of breast cancer. We conclude that Diffacto can facilitate the interpretation and enhance the utility of most types of proteomics data.
Highlights
Because a typical experiment involves proteolytic digestion, the actual analytes measured by liquid chromatography-tandem mass spectrometry (LC-MS/MS)1 are the proteolytic peptides of the analyzed proteins
The transformation of abundance scale balanced the contributions of peptides for each protein, despite the vastly different ion-intensities observed in LC-MS/MS experiments
Based on the proportionality principle, every observed peptide abundance should be a combination of two parts: the signal responding to the relative change of protein concentration (z), plus the noise () mainly caused by measurement errors
Summary
By putting more trust in peptides that demonstrate a stronger covariation with the other peptides from the same protein, one can make better use of the proportionality principle Utilizing such information about covariation, other approaches have been shown to improve the validity of protein inference and signal integration (16 –18), or provide a basis for selecting peptides for quantitative analysis [19, 20]. These approaches have drawbacks in terms of dependences toward specific quantification techniques or the difficulty with handling missing values; and often incorrectly treat all peptides as independent variables when summarizing each individual LC-MS/MS experiment
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.