A multithreading and hashing technique for indexing Target‐Decoy peptides databases

Majdi Maabreh,Izzat Alasmadi,Hafez Irshid,Ajay Gupta

doi:10.1002/cpe.4371

Abstract

SummaryTarget‐Decoy database is currently the method of choice to assess the quality of Proteins' search engines. Decoy versions of real peptides are generated and injected to the same database of real ones with different labels. Quality of search engines results is assessed based on the number of decoys retrieved as hits. In Crux‐Tide search engine, which is one of the fastest search engines currently available, the process of indexing and generating decoys is computationally expensive. In this paper, we analyze the serial algorithm in detail and show improvement possibilities, and then describe a parallel‐shared memory solution using OpenMP. To completely break up the dependency in the serial algorithms, a clever hashing technique is utilized to localize the process. The parallel solution and the hashing technique together are able to reduce the computation cost by approximately 70‐80% using few threads. Besides the parallelization, we redesign part of the serial code so that the memory consumption becomes more efficient. The parallel version can index the same files using around two‐third of the memory space that the serial version consumes. This solution could impact and support future distributed developments of Crux‐Tide searching phase, where each parallel unit could rank the observed spectra independently.

Full Text