Abstract

The PRoteomics IDEntifications (PRIDE) database is a large public proteomics data repository, containing over 270 million mass spectra (by November 2011). PRIDE is an archival database, providing the proteomics data supporting specific scientific publications in a computationally accessible manner. While PRIDE faces rapid increases in data deposition size as well as number of depositions, the major challenge is to ensure a high quality of data depositions in the context of highly diverse proteomics work flows and data representations. Here, we describe the PRIDE curation pipeline and its practical application in quality control of complex data depositions.Database URL: http://www.ebi.ac.uk/pride/.

Highlights

  • Proteomics can be defined as ‘the study of the subsets of proteins present in different parts of the organism and how they change with time and varying conditions’ (1)

  • The situation is already improving significantly as a result of the Human Proteome Organization Proteomics Standards Initiative (PSI) developing the standard formats mzML (2) and mzIdentML (3), which are becoming increasingly implemented by instrument and search engine producers

  • While generation and public availability of proteomics data are still, several orders of magnitude smaller than e.g. genomics data, both quantity and complexity of proteomics data sets deposited in the PRoteomics IDEntifications (PRIDE) database are rapidly increasing

Read more

Summary

Original article

Attila Csordas*, David Ovelleiro, Rui Wang, Joseph M. The PRoteomics IDEntifications (PRIDE) database is a large public proteomics data repository, containing over 270 million mass spectra (by November 2011). PRIDE is an archival database, providing the proteomics data supporting specific scientific publications in a computationally accessible manner. While PRIDE faces rapid increases in data deposition size as well as number of depositions, the major challenge is to ensure a high quality of data depositions in the context of highly diverse proteomics work flows and data representations. We describe the PRIDE curation pipeline and its practical application in quality control of complex data depositions.

Introduction
The PRIDE curation pipeline
Frequent data quality issues
PRIDE curation snippets
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call