Analysis of the tryptic search space in UniProt databases.

Emanuele Alpi,Daniel Ríos,Alan Wilter Sousa Da Silva,Claire O'Donovan,Hermann Zellner,Johannes Griss,Juan Antonio Vizcaíno,Ricardo Antunes,Benoit Bely,Maria J Martin

doi:10.1002/pmic.201400227

Abstract

In this article, we provide a comprehensive study of the content of the Universal Protein Resource (UniProt) protein data sets for human and mouse. The tryptic search spaces of the UniProtKB (UniProt knowledgebase) complete proteome sets were compared with other data sets from UniProtKB and with the corresponding International Protein Index, reference sequence, Ensembl, and UniRef100 (where UniRef is UniProt reference clusters) organism-specific data sets. All protein forms annotated in UniProtKB (both the canonical sequences and isoforms) were evaluated in this study. In addition, natural and disease-associated amino acid variants annotated in UniProtKB were included in the evaluation. The peptide unicity was also evaluated for each data set. Furthermore, the peptide information in the UniProtKB data sets was also compared against the available peptide-level identifications in the main MS-based proteomics repositories. Identifying the peptides observed in these repositories is an important resource of information for protein databases as they provide supporting evidence for the existence of otherwise predicted proteins. Likewise, the repositories could use the information available in UniProtKB to direct reprocessing efforts on specific sets of peptides/proteins of interest. In summary, we provide comprehensive information about the different organism-specific sequence data sets available from UniProt, together with the pros and cons for each, in terms of search space for MS-based bottom-up proteomics workflows. The aim of the analysis is to provide a clear view of the tryptic search space of UniProt and other protein databases to enable scientists to select those most appropriate for their purposes.

Highlights

Most of the current MS-based bottom-up proteomics workflows make use of collections of sequences to match peptide sequences to experimental spectra and to infer the proteins to which those peptides belong [1]
UPI data sets contain additional peptide-level sequence information compared to the CPI data sets for human and mouse (Table 4; see Supporting Information Notes “Universal Protein Resource (UniProt) complete proteomes and other sequences”). (iii) If variation data need to be considered in a given study, a variant-expanded data set is the proper choice
An additional choice consists in focusing on a subset of variation considering only the ones directly linked to disease, obtaining a data set focusing on detrimental variations with a lower sequence redundancy in the variant-expanded sequence data set

Summary

Introduction

Most of the current MS-based bottom-up proteomics workflows make use of collections of sequences Uniprot.org) [2] is among the most used protein sequence and functional annotation providers. Among the UniProt databases (DBs) are the UniProt knowledgebase (UniProtKB) that acts as the central hub for the collection of functional information on proteins and the UniProt reference clusters (UniRef) [3] that merge closely related sequences based on sequence identity. UniProtKB consists of two sections: UniProtKB/Swiss-Prot, which is manually annotated and reviewed, and UniProtKB/TrEMBL, which is automatically annotated and is unreviewed.

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PROTEOMICS	Publication Date: Dec 3, 2014
Citations: 26	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Analysis of the tryptic search space in UniProt databases.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PROTEOMICS

Lead the way for us

Similar Papers

A Simulated MS/MS Library for Spectrum-to-spectrum Searching in Large Scale Identification of Proteins
Chia-Yu Yen ... Katheryn A Resing
Molecular & cellular proteomics : MCP | VOL. 8
Chia-Yu Yen, et. al.Chia-Yu Yen ... Katheryn A Resing
01 Apr 2009
Molecular & cellular proteomics : MCP | VOL. 8

Consequences of the discontinuation of the International Protein Index (IPI) database and its substitution by the UniProtKB “complete proteome” sets
Johannes Griss ... Claire O'Donovan
PROTEOMICS | VOL. 11
Johannes Griss, et. al.Johannes Griss ... Claire O'Donovan
17 Oct 2011
PROTEOMICS | VOL. 11

From chemoproteomic-detected amino acids to genomic coordinates: insights into precise multi-omic data integration.
Maria F Palafox ... Heta S Desai
Molecular systems biology | VOL. 17
Maria F Palafox, et. al.Maria F Palafox ... Heta S Desai
01 Feb 2021
Molecular systems biology | VOL. 17

In Silico Analysis of Phosphoproteome Data Suggests a Rich-get-richer Process of Phosphosite Accumulation over Evolution
Nozomu Yachie ... Yasushi Ishihama
Molecular & cellular proteomics : MCP | VOL. 8
Nozomu Yachie, et. al.Nozomu Yachie ... Yasushi Ishihama
01 May 2009
Molecular & cellular proteomics : MCP | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Analysis of the tryptic search space in UniProt databases.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PROTEOMICS