MCtandem: an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture

Chuang Li,Keqin Li,Kenli Li,Feng Lin

doi:10.1186/s12859-019-2980-5

Abstract

BackgroundTandem mass spectrometry (MS/MS)-based database searching is a widely acknowledged and widely used method for peptide identification in shotgun proteomics. However, due to the rapid growth of spectra data produced by advanced mass spectrometry and the greatly increased number of modified and digested peptides identified in recent years, the current methods for peptide database searching cannot rapidly and thoroughly process large MS/MS spectra datasets. A breakthrough in efficient database search algorithms is crucial for peptide identification in computational proteomics.ResultsThis paper presents MCtandem, an efficient tool for large-scale peptide identification on Intel Many Integrated Core (MIC) architecture. To support big data processing capability, a novel parallel match scoring algorithm, named MIC-SDP (spectrum dot product), and its two-level parallelization are presented in MCtandem’s design. In addition, a series of optimization strategies on both the host CPU side and the MIC side, which includes pre-fetching, optimized communication overlapping scheme, multithreading and hyper-threading, are exploited to improve the execution performance.ConclusionsFor fair comparisons, we first set up experiments and verified the 28 fold times speedup on a single MIC against the original CPU-based implementation. We then execute the MCtandem for a very large dataset on an MIC cluster (a component of the Tianhe-2 supercomputer) and achieved much higher scalability than in a benchmark MapReduce-based programs, MR-Tandem. MCtandem is an open-source software tool implemented in C++. The source code and the parameter settings are available at https://github.com/LogicZY/MCtandem.

Highlights

Tandem mass spectrometry (MS/MS)-based database searching is a widely acknowledged and widely used method for peptide identification in shotgun proteomics
Without the development of more powerful and efficient peptide database searching methods, we can expect computational bottlenecks to limit the scope of discoveries to small-scale MS/MS spectra data
A breakthrough in efficient database search algorithms is crucial for large-scale peptide identification, especially entire human proteome analysis, in computational proteomics

Summary

Introduction

Tandem mass spectrometry (MS/MS)-based database searching is a widely acknowledged and widely used method for peptide identification in shotgun proteomics. Due to the rapid growth of spectra data produced by advanced mass spectrometry and the greatly increased number of modified and digested peptides identified in recent years, the current methods for peptide database searching cannot rapidly and thoroughly process large MS/MS spectra datasets. A breakthrough in efficient database search algorithms is crucial for peptide identification in computational proteomics. Database search-based peptide identification, which aims to retrieve all candidate sequences from a specified protein sequence database for each tandem mass spectrometry (MS/MS) spectrum, is existing peptide database search tools still suffer from low computational efficiency due to a number of limitations. Without the development of more powerful and efficient peptide database searching methods, we can expect computational bottlenecks to limit the scope of discoveries to small-scale MS/MS spectra data. A breakthrough in efficient database search algorithms is crucial for large-scale peptide identification, especially entire human proteome analysis, in computational proteomics

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jul 17, 2019
Citations: 12	License type: open-access

R Discovery Prime

R Discovery Prime

MCtandem: an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

SW-Tandem: a highly efficient tool for large-scale peptide identification with parallel spectrum dot product on Sunway TaihuLight.
Chuang Li ... Tao Chen
Bioinformatics | VOL. 35
Chuang Li, et. al.Chuang Li ... Tao Chen
01 Mar 2019
Bioinformatics | VOL. 35

Peptide Identification by Database Search of Mixture Tandem Mass Spectra
Jian Wang ... Nuno Bandeira
Molecular & Cellular Proteomics | VOL. 10
Jian Wang, et. al.Jian Wang ... Nuno Bandeira
23 Aug 2011
Molecular & Cellular Proteomics | VOL. 10

Combining Results of Multiple Search Engines in Proteomics
David Shteynberg ... Eric W Deutsch
Molecular & Cellular Proteomics | VOL. 12
David Shteynberg, et. al.David Shteynberg ... Eric W Deutsch
01 Sep 2013
Molecular & Cellular Proteomics | VOL. 12

Peptizer, a Tool for Assessing False Positive Peptide Identifications and Manually Validating Selected Results
Kenny Helsens ... Lennart Martens
Molecular & Cellular Proteomics | VOL. 7
Kenny Helsens, et. al.Kenny Helsens ... Lennart Martens
01 Dec 2008
Molecular & Cellular Proteomics | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MCtandem: an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics