Abstract

BackgroundThe key to mass-spectrometry-based proteomics is peptide identification. A major challenge in peptide identification is to obtain realistic E-values when assigning statistical significance to candidate peptides.ResultsUsing a simple scoring scheme, we propose a database search method with theoretically characterized statistics. Taking into account possible skewness in the random variable distribution and the effect of finite sampling, we provide a theoretical derivation for the tail of the score distribution. For every experimental spectrum examined, we collect the scores of peptides in the database, and find good agreement between the collected score statistics and our theoretical distribution. Using Student's t-tests, we quantify the degree of agreement between the theoretical distribution and the score statistics collected. The T-tests may be used to measure the reliability of reported statistics. When combined with reported P-value for a peptide hit using a score distribution model, this new measure prevents exaggerated statistics. Another feature of RAId_DbS is its capability of detecting multiple co-eluted peptides. The peptide identification performance and statistical accuracy of RAId_DbS are assessed and compared with several other search tools. The executables and data related to RAId_DbS are freely available upon request.

Highlights

  • IntroductionAs an indispensable component in mass spectrometry (MS) based protein identification, peptide identification through tandem MS (MS2) is usually aided by automated data analysis

  • Protein identification is the key to proteomics

  • In panel (A), the red staircase represents the histogram of scores computed using Eq (1) with wi = 1, while the blue line represents the theoretical distribution predicted from peptides with n = 44 theoretical peaks

Read more

Summary

Introduction

As an indispensable component in mass spectrometry (MS) based protein identification, peptide identification through tandem MS (MS2) is usually aided by automated data analysis. Among available data analysis tools, methods based on database searches are most frequently used. Examples of database search methods directly reporting P- or E-values include, but are not limited to, Mascot [8], Sonar [9], InsPecT [10], OMSSA [11], and X!Tandem [12]. Your research papers will be: available free of charge to the entire biomedical community peer reviewed and published immediately upon acceptance cited in PubMed and archived on PubMed Central yours — you keep the copyright. The key to mass-spectrometry-based proteomics is peptide identification. A major challenge in peptide identification is to obtain realistic E-values when assigning statistical significance to candidate peptides

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.