Speeding up tandem mass spectrometry-based database searching by longest common prefix

Chen Zhou,Rui-Xiang Sun,Hao Chi,Si-Min He,Yan-Jie Wu,Yan Fu,You Li,Le-Heng Wang

doi:10.1186/1471-2105-11-577

Abstract

BackgroundTandem mass spectrometry-based database searching has become an important technology for peptide and protein identification. One of the key challenges in database searching is the remarkable increase in computational demand, brought about by the expansion of protein databases, semi- or non-specific enzymatic digestion, post-translational modifications and other factors. Some software tools choose peptide indexing to accelerate processing. However, peptide indexing requires a large amount of time and space for construction, especially for the non-specific digestion. Additionally, it is not flexible to use.ResultsWe developed an algorithm based on the longest common prefix (ABLCP) to efficiently organize a protein sequence database. The longest common prefix is a data structure that is always coupled to the suffix array. It eliminates redundant candidate peptides in databases and reduces the corresponding peptide-spectrum matching times, thereby decreasing the identification time. This algorithm is based on the property of the longest common prefix. Even enzymatic digestion poses a challenge to this property, but some adjustments can be made to this algorithm to ensure that no candidate peptides are omitted. Compared with peptide indexing, ABLCP requires much less time and space for construction and is subject to fewer restrictions.ConclusionsThe ABLCP algorithm can help to improve data analysis efficiency. A software tool implementing this algorithm is available at http://pfind.ict.ac.cn/pfind2dot5/index.htm

Highlights

Tandem mass spectrometry-based database searching has become an important technology for peptide and protein identification
We propose ABLCP, an algorithm based on the longest common prefix, to organize the database efficiently to retain the advantages and avoid the drawbacks of these approaches
ABLCP uses online digestion, it is subject to fewer restrictions

Summary

Introduction

Tandem mass spectrometry-based database searching has become an important technology for peptide and protein identification. One of the key challenges in database searching is the remarkable increase in computational demand, brought about by the expansion of protein databases, semi- or non-specific enzymatic digestion, post-translational modifications and other factors. The existing tools are not quick enough, for the following reasons: First, the size of protein databases is increasing significantly, resulting in many peptides. Semi- or non-specific digestion generates 10 to 100 times more peptides than full-specific digestion. The number of non-redundant peptides generated by full-specific digestion with up to two missed cleavage sites in the IPI-Human V3.65 database [11] is 3549956, and it increases 170-fold to 626871441 for non-specific digestion

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Nov 25, 2010
Citations: 40	License type: cc-by

R Discovery Prime

R Discovery Prime

Speeding up tandem mass spectrometry-based database searching by longest common prefix

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

String Inference from Longest-Common-Prefix Array
...
-
, et. al. ...
31 Jan 2018
31 Jan 2018

Peptide Identification by Database Search of Mixture Tandem Mass Spectra
Jian Wang ... Philip E Bourne
Molecular & Cellular Proteomics | VOL. 10
Jian Wang, et. al.Jian Wang ... Philip E Bourne
23 Aug 2011
Molecular & Cellular Proteomics | VOL. 10

Combining Results of Multiple Search Engines in Proteomics
David Shteynberg ... Eric W Deutsch
Molecular & Cellular Proteomics | VOL. 12
David Shteynberg, et. al.David Shteynberg ... Eric W Deutsch
01 Sep 2013
Molecular & Cellular Proteomics | VOL. 12

Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array
German Tischler
-
German TischlerGerman Tischler
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speeding up tandem mass spectrometry-based database searching by longest common prefix

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics