Abstract

BackgroundTandem mass spectrometry (MS/MS)-based database searching is a widely acknowledged and widely used method for peptide identification in shotgun proteomics. However, due to the rapid growth of spectra data produced by advanced mass spectrometry and the greatly increased number of modified and digested peptides identified in recent years, the current methods for peptide database searching cannot rapidly and thoroughly process large MS/MS spectra datasets. A breakthrough in efficient database search algorithms is crucial for peptide identification in computational proteomics.ResultsThis paper presents MCtandem, an efficient tool for large-scale peptide identification on Intel Many Integrated Core (MIC) architecture. To support big data processing capability, a novel parallel match scoring algorithm, named MIC-SDP (spectrum dot product), and its two-level parallelization are presented in MCtandem’s design. In addition, a series of optimization strategies on both the host CPU side and the MIC side, which includes pre-fetching, optimized communication overlapping scheme, multithreading and hyper-threading, are exploited to improve the execution performance.ConclusionsFor fair comparisons, we first set up experiments and verified the 28 fold times speedup on a single MIC against the original CPU-based implementation. We then execute the MCtandem for a very large dataset on an MIC cluster (a component of the Tianhe-2 supercomputer) and achieved much higher scalability than in a benchmark MapReduce-based programs, MR-Tandem. MCtandem is an open-source software tool implemented in C++. The source code and the parameter settings are available at https://github.com/LogicZY/MCtandem.

Highlights

  • Tandem mass spectrometry (MS/MS)-based database searching is a widely acknowledged and widely used method for peptide identification in shotgun proteomics

  • Without the development of more powerful and efficient peptide database searching methods, we can expect computational bottlenecks to limit the scope of discoveries to small-scale MS/MS spectra data

  • A breakthrough in efficient database search algorithms is crucial for large-scale peptide identification, especially entire human proteome analysis, in computational proteomics

Read more

Summary

Introduction

Tandem mass spectrometry (MS/MS)-based database searching is a widely acknowledged and widely used method for peptide identification in shotgun proteomics. Due to the rapid growth of spectra data produced by advanced mass spectrometry and the greatly increased number of modified and digested peptides identified in recent years, the current methods for peptide database searching cannot rapidly and thoroughly process large MS/MS spectra datasets. A breakthrough in efficient database search algorithms is crucial for peptide identification in computational proteomics. Database search-based peptide identification, which aims to retrieve all candidate sequences from a specified protein sequence database for each tandem mass spectrometry (MS/MS) spectrum, is existing peptide database search tools still suffer from low computational efficiency due to a number of limitations. Without the development of more powerful and efficient peptide database searching methods, we can expect computational bottlenecks to limit the scope of discoveries to small-scale MS/MS spectra data. A breakthrough in efficient database search algorithms is crucial for large-scale peptide identification, especially entire human proteome analysis, in computational proteomics

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call