Mass spectrometry-based protein identification by integrating de novo sequencing with database searching

Penghao Wang,Susan R Wilson

doi:10.1186/1471-2105-14-s2-s24

Abstract

BackgroundMass spectrometry-based protein identification is a very challenging task. The main identification approaches include de novo sequencing and database searching. Both approaches have shortcomings, so an integrative approach has been developed. The integrative approach firstly infers partial peptide sequences, known as tags, directly from tandem spectra through de novo sequencing, and then puts these sequences into a database search to see if a close peptide match can be found. However the current implementation of this integrative approach has several limitations. Firstly, simplistic de novo sequencing is applied and only very short sequence tags are used. Secondly, most integrative methods apply an algorithm similar to BLAST to search for exact sequence matches and do not accommodate sequence errors well. Thirdly, by applying these methods the integrated de novo sequencing makes a limited contribution to the scoring model which is still largely based on database searching.ResultsWe have developed a new integrative protein identification method which can integrate de novo sequencing more efficiently into database searching. Evaluated on large real datasets, our method outperforms popular identification methods.

Highlights

Mass spectrometry-based protein identification is a very challenging task
Evaluation strategy Datasets To evaluate the performance of our method, we use the raw spectra from two large-scale datasets as a benchmark: (1) the Aurum dataset [25] and (2) the CPTAC dataset [26] from Clinical Proteomic Technologies Assessment for Cancer
The CPTAC dataset comes from a large-scale study of the reproducibility and repeatability of the Universal Proteomics Standard Set 1 (UPS1)

Summary

Introduction

Mass spectrometry-based protein identification is a very challenging task. The main identification approaches include de novo sequencing and database searching. Most integrative methods apply an algorithm similar to BLAST to search for exact sequence matches and do not accommodate sequence errors well. By applying these methods the integrated de novo sequencing makes a limited contribution to the scoring model which is still largely based on database searching. Accurate identification of proteins from tandem mass spectra is a very challenging task and existing methods can typically identify fewer than 50% of the proteins in a complex sample [1-3]. Despite having the advantage of robustness, the database search approach has several limitations It is only effective if the proteins of interest are already known and the utilised database contains the correct protein sequences. Specifying the enzyme used in the proteolytic digestion can exclude the correct peptides from the database search space and lead to erroneous identifications [10]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jan 1, 2013
Citations: 45	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Mass spectrometry-based protein identification by integrating de novo sequencing with database searching

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Accelerating the scoring module of mass spectrometry-based peptide identification using GPUs.
You Li ... Xiaowen Chu
BMC bioinformatics | VOL. 15
You Li, et. al.You Li ... Xiaowen Chu
28 Apr 2014
BMC bioinformatics | VOL. 15

Analysing proteomic data
J Barrett ... J.V Hamilton
International Journal for Parasitology | VOL. 35
J Barrett, et. al.J Barrett ... J.V Hamilton
07 Mar 2005
International Journal for Parasitology | VOL. 35

Speeding up Scoring Module of Mass Spectrometry Based Protein Identification by GPU
You Li ... Xiaowen Chu
-
You Li, et. al.You Li ... Xiaowen Chu
01 Jun 2012
01 Jun 2012

Gapped Spectral Dictionaries and Their Applications for Database Searches of Tandem Mass Spectra
Kyowon Jeong ... Pavel A Pevzner
Molecular & Cellular Proteomics | VOL. 10
Kyowon Jeong, et. al.Kyowon Jeong ... Pavel A Pevzner
28 Mar 2011
Molecular & Cellular Proteomics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mass spectrometry-based protein identification by integrating de novo sequencing with database searching

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics