Abstract

BackgroundComputing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators.ResultsThis paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency.ConclusionsEvaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi.

Highlights

  • Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology

  • Since biological sequence databases are continuously growing, finding fast solutions is of high importance

  • In this paper we are investigating how a Xeon Phi-based compute cluster can be used as a computational platform to accelerate alignment algorithms based on dynamic programming for two applications: (i) databases scanning of protein sequence databases with the Smith-Waterman algorithm, and

Read more

Summary

Introduction

Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. Calculating similarity scores between a given query protein sequence and all sequences of a database and computing multiple sequence alignments are two common tasks in bioinformatics. Both tasks include iterative calculations of pairwise local alignments as a basic building block. Recent examples of efficient parallelization on Xeon Phis include scientific computing [5], bioinformatics [6,7,8,9,10], and database operations [11]. Parallelization between Xeon Phis adds another level of message passing based parallelism This level needs to consider data partitioning, load balancing, and task scheduling. The 2nd generation Xeon Phi processor named “Knight’s Landing” has already been announced

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call