SW-Tandem: a highly efficient tool for large-scale peptide identification with parallel spectrum dot product on Sunway TaihuLight.

Chuang Li,Kenli Li,Yunping Zhu,Qiang He,Tao Chen,Janet Kelso

doi:10.1093/bioinformatics/btz147

Abstract

Tandem mass spectrometry based database searching is a widely acknowledged and adopted method that identifies peptide sequence in shotgun proteomics. However, database searching is extremely computationally expensive, which can take days even weeks to process a large spectra dataset. To address this critical issue, this paper presents SW-Tandem, a new tool for large-scale peptide sequencing. SW-Tandem parallelizes the spectrum dot product scoring algorithm and leverages the advantages of Sunway TaihuLight, the No. 1 supercomputer in the world in 2017. Sunway TaihuLight is powered by the brand new many-core SW26010 processors and provides a peak computation performance greater than 100PFlops. To fully utilize the Sunway TaihuLights capacity, SW-Tandem employs three mechanisms to accelerate large-scale peptide identification, memory-access optimizations, double buffering and vectorization. The results of experiments conducted on multiple datasets demonstrate the performance of SW-Tandem against three state-of-the-art tools for peptide identification, including X!! Tandem, MR-Tandem and MSFragger. In addition, it shows high scalability in the experiments on extremely large datasets sized up to 12GB. SW-Tandem is an open source software tool implemented in C++. The source code and the parameter settings are available at https://github.com/Logic09/SW-Tandem. Supplementary data are available at Bioinformatics online.

Full Text