A new estimation of protein-level false discovery rate

Guanying Wu,Xiang Wan,Baohua Xu

doi:10.1186/s12864-018-4923-3

Guanying Wu, Xiang Wan + Show 1 more

Open Access

https://doi.org/10.1186/s12864-018-4923-3

Copy DOI

Abstract

BackgroundIn mass spectrometry-based proteomics, protein identification is an essential task. Evaluating the statistical significance of the protein identification result is critical to the success of proteomics studies. Controlling the false discovery rate (FDR) is the most common method for assuring the overall quality of the set of identifications. Existing FDR estimation methods either rely on specific assumptions or rely on the two-stage calculation process of first estimating the error rates at the peptide-level, and then combining them somehow at the protein-level. We propose to estimate the FDR in a non-parametric way with less assumptions and to avoid the two-stage calculation process.ResultsWe propose a new protein-level FDR estimation framework. The framework contains two major components: the Permutation+BH (Benjamini–Hochberg) FDR estimation method and the logistic regression-based null inference method. In Permutation+BH, the null distribution of a sample is generated by searching data against a large number of permuted random protein database and therefore does not rely on specific assumptions. Then, p-values of proteins are calculated from the null distribution and the BH procedure is applied to the p-values to achieve the relationship of the FDR and the number of protein identifications. The Permutation+BH method generates the null distribution by the permutation method, which is inefficient for online identification. The logistic regression model is proposed to infer the null distribution of a new sample based on existing null distributions obtained from the Permutation+BH method.ConclusionsIn our experiment based on three public available datasets, our Permutation+BH method achieves consistently better performance than MAYU, which is chosen as the benchmark FDR calculation method for this study. The null distribution inference result shows that the logistic regression model achieves a reasonable result both in the shape of the null distribution and the corresponding FDR estimation result.

Highlights

IntroductionEvaluating the statistical significance of the protein identification result is critical to the success of proteomics studies
In mass spectrometry-based proteomics, protein identification is an essential task
Experimental MS/MS spectra are searched against a sequence database to obtain a set of peptide-spectrum matches (PSMs) [2,3,4]

Summary

Introduction

Evaluating the statistical significance of the protein identification result is critical to the success of proteomics studies. Existing FDR estimation methods either rely on specific assumptions or rely on the two-stage calculation process of first estimating the error rates at the peptide-level, and combining them somehow at the protein-level. The identification of proteins is a two-stage process: peptide identification and protein inference [1]. The ability of accurately inferring proteins and directly assessing such inference results is critical to the success of proteomics studies. Many effective protein inference algorithms have been developed such as ProteinProphet, ComByne and MSBayesPro. the problem of accurate assessment of statistical significance of protein identifications remains an open question [8, 9]. Past research efforts towards this direction can be classified into p-value based approaches and false discovery rate (FDR) approaches: Wu et al BMC Genomics 2018, 19(Suppl 6):567

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Aug 1, 2018
Citations: 9	License type: open-access

R Discovery Prime

R Discovery Prime

A new estimation of protein-level false discovery rate

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry
Lukas Reiter ... Ruedi Aebersold
Molecular & Cellular Proteomics | VOL. 8
Lukas Reiter, et. al.Lukas Reiter ... Ruedi Aebersold
01 Nov 2009
Molecular & Cellular Proteomics | VOL. 8

Common Decoy Distributions Simplify False Discovery Rate Estimation in Shotgun Proteomics.
Dominik Madej ... Henry Lam
Journal of Proteome Research | VOL. 21
Dominik Madej, et. al.Dominik Madej ... Henry Lam
06 Jan 2022
Journal of Proteome Research | VOL. 21

A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data
Yang Xie ... Wei Pan
Bioinformatics | VOL. 21
Yang Xie, et. al.Yang Xie ... Wei Pan
27 Sep 2005
Bioinformatics | VOL. 21

Local and covariate-modulated false discovery rates applied in neuroimaging
Glenn Lawyer ... Ingrid Agartz
NeuroImage | VOL. 47
Glenn Lawyer, et. al.Glenn Lawyer ... Ingrid Agartz
31 Mar 2009
NeuroImage | VOL. 47

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A new estimation of protein-level false discovery rate

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics