Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.

Eric W Deutsch,Zhi Sun,Luis Mendoza,Gilbert S Omenn,David S Campbell,Pierre-Alain Binz,Robert L Moritz,Terry Farrah,David Shteynberg

doi:10.1021/acs.jproteome.6b00445

Abstract

The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/ .

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.

Abstract

Talk to us

Similar Papers

More From: Journal of Proteome Research

Lead the way for us

Journal: Journal of Proteome Research	Publication Date: Sep 12, 2016
Citations: 26

Similar Papers

A Proteogenomic Survey of the Medicago truncatula Genome
Jeremy D Volkening ... Michael R Sussman
Molecular & Cellular Proteomics | VOL. 11
Jeremy D Volkening, et. al.Jeremy D Volkening ... Michael R Sussman
01 Oct 2012
Molecular & Cellular Proteomics | VOL. 11

Domain fusion analysis by applying relational algebra to protein sequence and domain databases
Kevin Truong ... Mitsuhiko Ikura
BMC Bioinformatics | VOL. 4
Kevin Truong, et. al.Kevin Truong ... Mitsuhiko Ikura
01 Jan 2003
BMC Bioinformatics | VOL. 4

A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics
Jing Li ... Zengliu Su
Molecular & Cellular Proteomics | VOL. 10
Jing Li, et. al.Jing Li ... Zengliu Su
09 Mar 2011
Molecular & Cellular Proteomics | VOL. 10

Using the FASTA program to search protein and DNA sequence databases.
William R. Pearson
Methods in molecular biology (Clifton, N.J.) | VOL. 24
William R. PearsonWilliam R. Pearson
01 Jan 1993
Methods in molecular biology (Clifton, N.J.) | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.

Abstract

Talk to us

Similar Papers

More From: Journal of Proteome Research