Abstract

Biological Sequence alignment is a fundamental application in bioinformatics. It can be used to identify functionally conserved sequences and find evolutionary relationships between species. To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. Global alignments are important because they reveal the shared order of biological features in the compared species, and produce a more accurate alignment at the base-pair level when the features are in the same order. The best known global alignment algorithm is Needleman-Wunsch, later, BitPAl, a bit parallel algorithm for general, integer scoring global algorithm, provides a new implementation of Needleman-Wunsch algorithm (BitNW). Compared with original Needleman-Wunsch algorithm, BitNW is significantly faster by exploiting bit parallelism. A number of parallel strategies have been proposed to accelerate exact alignment methods. However, most of them failed to align long biological sequences due to quadratic time complexity. In this paper, we propose SLPal, a fast bit-parallel algorithm for accelerating long DNA sequence comparison on Intel manycore and multi-core architectures. In order to fully exploit the computing power of many cores and the 512-bit vector processing units (VPUs), we use a two-level parallelism scheme: coarsegrained thread level and fine-grained VPU level approaches. In thread level, the alignment scoring matrix will be split into small tiles and multiple threads will process these small tiles currently by using Intel TBB library. In the VPU level, the computing kernels are implemented using the Single Instruction Multiple Data (SIMD) instructions, thus, 16 independent integers reside in a 512-bit vector register can be processed simultaneously. The evaluation reveals that our algorithm achieves a stable performance for all benchmark data and yields a performance of up to 511.7 (617.2) GCUPS on a server with single Xeon Phi 7210 processor (dual Xeon Gold 614820-core processors). Furthermore, our test shows that SLPal can align two sequences with about 5 million bps in 50 seconds on our server equipped with dual Xeon Gold 6148 CPUs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call