Abstract

In shotgun proteomics, the analysis of label-free quantification experiments is typically limited by the identification rate and the noise level in the quantitative data. This generally causes a low sensitivity in differential expression analysis. Here, we propose a quantification-first approach for peptides that reverses the classical identification-first workflow, thereby preventing valuable information from being discarded in the identification stage. Specifically, we introduce a method, Quandenser, that applies unsupervised clustering on both MS1 and MS2 level to summarize all analytes of interest without assigning identities. This reduces search time due to the data reduction. We can now employ open modification and de novo searches to identify analytes of interest that would have gone unnoticed in traditional pipelines. Quandenser+Triqler outperforms the state-of-the-art method MaxQuant+Perseus, consistently reporting more differentially abundant proteins for all tested datasets. Software is available for all major operating systems at https://github.com/statisticalbiotechnology/quandenser, under Apache 2.0 license.

Highlights

  • In shotgun proteomics, the analysis of label-free quantification experiments is typically limited by the identification rate and the noise level in the quantitative data

  • We note that label-free quantification (LFQ) is sometimes seen as cumbersome, as contrary to, for instance, isobaric labeling, one is not guaranteed a readout for an identified peptide in each sample

  • We compared these quantificationfirst setups to three identification-first setups: (1) MaxQuant + Perseus with MBR9,25, (2) MaxQuant + empirical Bayesian random censoring threshold model (EBRCT) with MBR9,26, and (3) Tide followed by Triqler, without clustering on MS1 and MS2 levels nor MBR but with feature detection using Dinosaur[27]

Read more

Summary

Introduction

The analysis of label-free quantification experiments is typically limited by the identification rate and the noise level in the quantitative data This generally causes a low sensitivity in differential expression analysis. We introduce a method, Quandenser, that applies unsupervised clustering on both MS1 and MS2 level to summarize all analytes of interest without assigning identities This reduces search time due to the data reduction. LFQ and quantitative proteomics in general are struggling to obtain sufficient coverage of the proteome[2] and suffer from low sensitivity for differentially abundant proteins at false discovery rate thresholds[3] While this can partially be attributed to inherent limitations in the methodology of mass spectrometry, it is, to a high degree, caused by the inadequacy of our current data analysis pipelines. Clustering of fragment spectra was recently demonstrated to give better sensitivity to LFQ experiments[15]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.