Abstract

BackgroundSequence alignment algorithms are a key component of many bioinformatics applications.Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations.ResultsA faster intra-sequence local pairwise alignment implementation is described and benchmarked, including new global and semi-global variants. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 24-core processor system, the highest reported for an implementation based on Farrar’s ‘striped’ approach. Rognes’s SWIPE optimal database search application is still generally the fastest available at 1.2 to at best 2.4 times faster than Parasail for sequences shorter than 500 amino acids. However, Parasail was faster for longer sequences. For global alignments, Parasail’s prefix scan implementation is generally the fastest, faster even than Farrar’s ‘striped’ approach, however the opal library is faster for single-threaded applications. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license.ConclusionsApplications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-0930-z) contains supplementary material, which is available to authorized users.

Highlights

  • Sequence alignment algorithms are a key component of many bioinformatics applications

  • When comparing the performance of various approaches and implementations, speed is reported in billion cell updates per second (GCUPS), where a cell is a value in the dynamic programming table

  • The Parasail library is an improvement over earlier single instruction multiple data (SIMD) intra-sequence implementations

Read more

Summary

Introduction

Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. Fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. Sequence alignment is an order-preserving way to map characters between two DNA or amino-acid (protein) sequences. It is a pervasive operation in bioinformatics workflows used to identify regions of high similarity between sequences. In addition to negative scores, alignments may be penalized by the insertion of gaps or deletion of characters. Gap penalties are often linear (a fixed negative value per gap) or affine [3], where the gap opening penalty is typically larger than the gap extension penalty

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call