Transfer posterior error probability estimation for peptide identification

Xinpei Yi,Yan Fu,Fuzhou Gong

doi:10.1186/s12859-020-3485-y

Xinpei Yi, Yan Fu + Show 1 more

Open Access

https://doi.org/10.1186/s12859-020-3485-y

Copy DOI

Journal: BMC bioinformatics	Publication Date: May 4, 2020
Citations: 13	License type: open-access

Affiliation: University of Chinese Academy of Sciences

Abstract

BackgroundIn shotgun proteomics, database searching of tandem mass spectra results in a great number of peptide-spectrum matches (PSMs), many of which are false positives. Quality control of PSMs is a multiple hypothesis testing problem, and the false discovery rate (FDR) or the posterior error probability (PEP) is the commonly used statistical confidence measure. PEP, also called local FDR, can evaluate the confidence of individual PSMs and thus is more desirable than FDR, which evaluates the global confidence of a collection of PSMs. Estimation of PEP can be achieved by decomposing the null and alternative distributions of PSM scores as long as the given data is sufficient. However, in many proteomic studies, only a group (subset) of PSMs, e.g. those with specific post-translational modifications, are of interest. The group can be very small, making the direct PEP estimation by the group data inaccurate, especially for the high-score area where the score threshold is taken. Using the whole set of PSMs to estimate the group PEP is inappropriate either, because the null and/or alternative distributions of the group can be very different from those of combined scores.ResultsThe transfer PEP algorithm is proposed to more accurately estimate the PEPs of peptide identifications in small groups. Transfer PEP derives the group null distribution through its empirical relationship with the combined null distribution, and estimates the group alternative distribution, as well as the null proportion, using an iterative semi-parametric method. Validated on both simulated data and real proteomic data, transfer PEP showed remarkably higher accuracy than the direct combined and separate PEP estimation methods.ConclusionsWe presented a novel approach to group PEP estimation for small groups and implemented it for the peptide identification problem in proteomics. The methodology of the approach is in principle applicable to the small-group PEP estimation problems in other fields.

Highlights

ResultsThe transfer posterior error probability (PEP) algorithm is proposed to more accurately estimate the PEPs of peptide identifications in small groups
In shotgun proteomics, database searching of tandem mass spectra results in a great number of peptide-spectrum matches (PSMs), many of which are false positives
Proteins are first digested into peptide mixture that is analyzed via high-throughput tandem mass spectrometry (MS/MS), resulting in thousands to millions of MS/MS spectra in a typical experiment

Summary

Results

In order to validate the performance of the transfer PEP algorithm, we must be able to know the theoretical distribution of data so as to compare the estimated PEP to the theoretical PEP. To evaluate the average performance of each estimation method in the S simulations, we calculated the mean and standard deviation (SD) of mean squared error (MSE) between the estimates, PE PG, and the theoretical values, PEPG, for top scores (Ratio = 1%, 5%, 10%, 20%, 100%). When the number of scores from fG1(x) was small (n = 1, 10, 20, 50), both the mean and SD of MSE were very large for the combined PEP, especially for the high-score regions. The separate PEP was much better, but still deviated from the theoretical PEPG when the number of scores from fG1(x) was too small (n = 1, 10, 20), especially for the high-score regions. The simulated MS/MS spectra used here were part of the data used in [19]

Conclusions

Background

Method