Accurate Estimation of Context-Dependent False Discovery Rates in Top-Down Proteomics

Richard D Leduc,Ryan T Fellers,Bryan P Early,Joseph B Greer,Daniel P Shams,Paul M Thomas,Neil L Kelleher

doi:10.1074/mcp.ra118.000993

Richard D Leduc, Ryan T Fellers + Show 5 more

Open Access

https://doi.org/10.1074/mcp.ra118.000993

Copy DOI

Journal: Molecular & Cellular Proteomics	Publication Date: Apr 1, 2019
Citations: 32	License type: cc-by

Affiliation: Northwestern University

Abstract

Within the last several years, top-down proteomics has emerged as a high throughput technique for protein and proteoform identification. This technique has the potential to identify and characterize thousands of proteoforms within a single study, but the absence of accurate false discovery rate (FDR) estimation could hinder the adoption and consistency of top-down proteomics in the future. In automated identification and characterization of proteoforms, FDR calculation strongly depends on the context of the search. The context includes MS data quality, the database being interrogated, the search engine, and the parameters of the search. Particular to top-down proteomics-there are four molecular levels of study: proteoform spectral match (PrSM), protein, isoform, and proteoform. Here, a context-dependent framework for calculating an accurate FDR at each level was designed, implemented, and validated against a manually curated training set with 546 confirmed proteoforms. We examined several search contexts and found that an FDR calculated at the PrSM level under-reported the true FDR at the protein level by an average of 24-fold. We present a new open-source tool, the TDCD_FDR_Calculator, which provides a scalable, context-dependent FDR calculation that can be applied post-search to enhance the quality of results in top-down proteomics from any search engine.

Highlights

Accurate and efficient false discovery rate (FDR)1 determination of protein and proteoform identifications is needed to improve top-down proteomics for large-scale, automated proteoform discovery and relative quantification [1,2]
The list of proteins discovered with a 1% FDR is not the same list as the list of proteins resulting from Proteoform Spectral Match (PrSM) discovered at a 1% FDR
The data used in the training set from Park et al yields 298 proteins when aggregated with a 1% protein level context-dependent FDR (CD FDR), but there 324 proteins when aggregated at the PrSM level and naïvely merged

Summary

Introduction

Accurate and efficient false discovery rate (FDR)1 determination of protein and proteoform identifications is needed to improve top-down proteomics for large-scale, automated proteoform discovery (qualitative analysis) and relative quantification (quantitative analysis) [1,2]. A context-dependent framework for calculating an accurate FDR at each level was designed, implemented, and validated against a manually curated training set with 546 confirmed proteoforms. We present a logical structure for calculating an identification FDR at the proteoform, isoform, and protein level using PrSMs from their given search context.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Accurate Estimation of Context-Dependent False Discovery Rates in Top-Down Proteomics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Molecular & Cellular Proteomics

Lead the way for us

Similar Papers

MS Annika: A New Cross-Linking Search Engine.
Georg J Pirklbauer ... Manuel Matzinger
Journal of proteome research | VOL. 20
Georg J Pirklbauer, et. al.Georg J Pirklbauer ... Manuel Matzinger
14 Apr 2021
Journal of proteome research | VOL. 20

Incorporating feature reliability in false discovery rateestimation improves statistical power to detect differentially expressed features
Elizabeth Chong ... Karan Uppal
-
Elizabeth Chong, et. al.Elizabeth Chong ... Karan Uppal
01 Oct 2014
01 Oct 2014

Combining Results of Multiple Search Engines in Proteomics
David Shteynberg ... Eric W Deutsch
Molecular & Cellular Proteomics | VOL. 12
David Shteynberg, et. al.David Shteynberg ... Eric W Deutsch
01 Sep 2013
Molecular & Cellular Proteomics | VOL. 12

Modeling Lower-Order Statistics to Enable Decoy-Free FDR Estimation in Proteomics.
Dominik Madej ... Henry Lam
Journal of proteome research | VOL. 22
Dominik Madej, et. al.Dominik Madej ... Henry Lam
24 Mar 2023
Journal of proteome research | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accurate Estimation of Context-Dependent False Discovery Rates in Top-Down Proteomics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Molecular & Cellular Proteomics