Abstract

BackgroundThe target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Bruijn methods. FDR is sometimes over- or under-estimated depending on which decoy database is used because the ratios of redundant peptides in the target databases are different, that is, the numbers of unique (non-redundancy) peptides in the target and decoy databases differ.ResultsWe used two protein databases (the UniProt Saccharomyces cerevisiae protein database and the UniProt human protein database) to compare the FDRs of various decoy databases. When the ratio of redundant peptides in the target database is low, the FDR is not overestimated by any decoy construction method. However, if the ratio of redundant peptides in the target database is high, the FDR is overestimated when the (pseudo) shuffle decoy database is used. Additionally, human and S. cerevisiae six frame translation databases, which are large databases, also showed outcomes similar to that from the UniProt human protein database.ConclusionThe FDR must be estimated using the correction factor proposed by Elias and Gygi or that by Kim et al. when (pseudo) shuffle decoy databases are used.

Highlights

  • One of the most important steps in peptide identification is to estimate the false discovery rate (FDR)

  • We denote the results with 1% false-discovery rate (FDR) from the reverse, shuffle, pseudo-reverse, pseudo-shuffle, and de Bruijn methods as ­FDRR, ­FDRS, ­FDRPR, ­FDRPS, and ­FDRD, respectively

  • Saccharomyces cerevisiae dataset We compared the results for the identified Peptidespectrum match (PSM) with the 1% FDR using the S. cerevisiae Elite and 2DLC dataset, the protein database, and various decoy databases

Read more

Summary

Introduction

One of the most important steps in peptide identification is to estimate the false discovery rate (FDR). To estimate the FDR, the target-decoy strategy [1] and the mixture model-based method [2, 3] have been suggested. The target-decoy strategy is easy to implement and effective, so it is frequently used [1]. The target-decoy strategy effectively estimates the FDR by creating a decoy database which is identical in size to the target database. The most frequently used is the reverse method, which creates a decoy database by reversing the. The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Bruijn methods. FDR is sometimes over- or under-estimated depending on which decoy database is used because the ratios of redundant peptides in the target databases are different, that is, the numbers of unique (non-redundancy) peptides in the target and decoy databases differ

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.