Making proteomics data accessible and reusable: current state of proteomics databases and repositories.

Yasset Perez‐Riverol,Rui Wang,Henning Hermjakob,Juan Antonio Vizcaíno,Emanuele Alpi

doi:10.1002/pmic.201400302

Yasset Perez‐Riverol, Rui Wang + Show 3 more

Open Access

https://doi.org/10.1002/pmic.201400302

Copy DOI

Journal: PROTEOMICS	Publication Date: Mar 1, 2015
Citations: 268	License type: CC BY 4.0

Affiliation: European Molecular Biology Laboratory

Abstract

Compared to other data-intensive disciplines such as genomics, public deposition and storage of MS-based proteomics, data are still less developed due to, among other reasons, the inherent complexity of the data and the variety of data types and experimental workflows. In order to address this need, several public repositories for MS proteomics experiments have been developed, each with different purposes in mind. The most established resources are the Global Proteome Machine Database (GPMDB), PeptideAtlas, and the PRIDE database. Additionally, there are other useful (in many cases recently developed) resources such as ProteomicsDB, Mass Spectrometry Interactive Virtual Environment (MassIVE), Chorus, MaxQB, PeptideAtlas SRM Experiment Library (PASSEL), Model Organism Protein Expression Database (MOPED), and the Human Proteinpedia. In addition, the ProteomeXchange consortium has been recently developed to enable better integration of public repositories and the coordinated sharing of proteomics information, maximizing its benefit to the scientific community. Here, we will review each of the major proteomics resources independently and some tools that enable the integration, mining and reuse of the data. We will also discuss some of the major challenges and current pitfalls in the integration and sharing of the data.

Highlights

The aim of this review is to provide an up-to-date overview of the current state of proteomics data repositories and databases, providing a solid starting point for those who want to perform data submission and/or data mining
Peptide and protein identifications are mapped to a comprehensive reference protein database (for the latest human builds, the searched database is a combination of UniProtKB/Swiss-Prot, Ensembl, and sequences from the International Protein Index (IPI)), and postprocessed using the trans-proteomic pipeline (TPP) [72]
The major differences with other repositories are as follows: (i) it does not exclusively contain MS-derived data, as mentioned already; (ii) data from proteomics experiments are viewed in the context of a protein–protein interaction resource (HPRD); (iii) it restricts the data to that derived from human tissues or cell lines; and (iv) data annotation related to various protein features can be done manually

Summary

Introduction

Compared to other data-intensive fields such as genomics, deposition and storage of original proteomics, data in public resources have been less common [13]. This is regrettable since proteome studies are usually more complex than its counterpart genomics ones. The audience interested in proteomics data is very heterogeneous It includes, biologists elucidating the mechanisms of regulation of specific proteins, MS researchers improving the current analytical methods, or computational biologists developing new software tools for the analysis and interpretation of the data [24].

Organization of proteomics repositories and databases

The PX consortium

Data submission and format support

Data mining and visualization

PASSEL

PeptideAtlas

MassIVE

Chorus

ProteomicsDB

3.10.1 Data submission and format support

3.10.2 Data mining and visualization

3.11.1 Data submission and format support

3.11.2 Data mining and visualization

3.12 Human Proteinpedia

3.12.1 Data submission and format support

3.12.2 Data mining and visualization

3.13.1 Data submission and format support

3.13.2 Data mining and visualization

3.14 Other proteomics resources

3.15 Proteomics information available through UniProt and neXtProt

Data reuse from public resources

Pitfalls and future challenges

Findings

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Making proteomics data accessible and reusable: current state of proteomics databases and repositories.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PROTEOMICS

Lead the way for us

Similar Papers

Proteomics Data Exchange and Storage: The Need for Common Standards and Public Repositories
Rafael C Jiménez ... Juan Antonio Vizcaíno
-
Rafael C Jiménez, et. al.Rafael C Jiménez ... Juan Antonio Vizcaíno
01 Jan 2013
01 Jan 2013

Using Annotated Peptide Mass Spectrum Libraries for Protein Identification
R Craig ... D Fenyo
Journal of Proteome Research | VOL. 5
R Craig, et. al.R Craig ... D Fenyo
14 Jul 2006
Journal of Proteome Research | VOL. 5

Systematic Proteogenomic Approach To Exploring a Novel Function for NHERF1 in Human Reproductive Disorder: Lessons for Exploring Missing Proteins.
Keun Na ... Jaeseung Lim
Journal of proteome research | VOL. 16
Keun Na, et. al.Keun Na ... Jaeseung Lim
01 Nov 2017
Journal of proteome research | VOL. 16

Using the Global Proteome Machine for Protein Identification
Ronald C. Beavis
-
Ronald C. BeavisRonald C. Beavis
01 Jan 2006
01 Jan 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Making proteomics data accessible and reusable: current state of proteomics databases and repositories.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PROTEOMICS