Abstract

Comprehensive characterization of a proteome is a fundamental goal in proteomics. To achieve saturation coverage of a proteome or specific subproteome via tandem mass spectrometric identification of tryptic protein sample digests, proteomics data sets are growing dramatically in size and heterogeneity. The trend toward very large integrated data sets poses so far unsolved challenges to control the uncertainty of protein identifications going beyond well established confidence measures for peptide-spectrum matches. We present MAYU, a novel strategy that reliably estimates false discovery rates for protein identifications in large scale data sets. We validated and applied MAYU using various large proteomics data sets. The data show that the size of the data set has an important and previously underestimated impact on the reliability of protein identifications. We particularly found that protein false discovery rates are significantly elevated compared with those of peptide-spectrum matches. The function provided by MAYU is critical to control the quality of proteome data repositories and thereby to enhance any study relying on these data sources. The MAYU software is available as standalone software and also integrated into the Trans-Proteomic Pipeline.

Highlights

  • Comprehensive characterization of a proteome is a fundamental goal in proteomics

  • MAYU: false discovery rate (FDR) for Protein Identifications—MAYU implements a target-decoy strategy to estimate the FDR for a set of protein identifications compiled from a selection of peptide-spectrum matches (PSMs)

  • We argue that the protein identification FDR estimates produced by MAYU are accurate in the context of typical proteomics studies in the range of 50 LC-MS/MS runs

Read more

Summary

Introduction

Comprehensive characterization of a proteome is a fundamental goal in proteomics. To achieve saturation coverage of a proteome or specific subproteome via tandem mass spectrometric identification of tryptic protein sample digests, proteomics data sets are growing dramatically in size and heterogeneity. We present MAYU, a novel strategy that reliably estimates false discovery rates for protein identifications in large scale data sets. Proteomics studies aiming at extensive proteome coverage generate very large data sets consisting of up to millions of fragment ion spectra. Shotgun proteomics experiments essentially aim at the compilation of a set of reliable protein identifications covering the proteome as extensively as possible. The FDR [21], i.e. the expected fraction of false positive assignments, has become a widely used measure for reliability of PSMs. The FDR for PSMs can be confidently estimated by means of decoy database search strategies in which the acquired fragment ion spectra are searched against a chimeric protein database containing all (target) protein sequences possibly present in the sample analyzed and an equal number of nonsense (decoy) sequences. Target-decoy strategies are appealing because they constitute a generic and independent approach to validate PSMs generated by any type of identification strategy

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.