Abstract

Sequence alignment is a long standing problem in bioinformatics. The Basic Local Alignment Search Tool (BLAST) is one of the most popular and fundamental alignment tools. The explosive growth of biological sequences calls for speedup of sequence alignment tools such as BLAST. To this end, we develop high speed BLASTN (HS-BLASTN), a parallel and fast nucleotide database search tool that accelerates MegaBLAST—the default module of NCBI-BLASTN. HS-BLASTN builds a new lookup table using the FMD-index of the database and employs an accurate and effective seeding method to find short stretches of identities (called seeds) between the query and the database. HS-BLASTN produces the same alignment results as MegaBLAST and its computational speed is much faster than MegaBLAST. Specifically, our experiments conducted on a 12-core server show that HS-BLASTN can be 22 times faster than MegaBLAST and exhibits better parallel performance than MegaBLAST. HS-BLASTN is written in C++ and the related source code is available at https://github.com/chenying2016/queries under the GPLv3 license.

Highlights

  • Identifying sequences having statistically significant local alignments with a given query is routine in computational biology

  • We compare the performance of HSBLASTN with that of MegaBLAST on each query set under different numbers of CPU threads

  • T M(q, n) T H(q, n) as the relative speedup achieved by HS-BLASTN in comparison to MegaBLAST when both alignment tools running on query set q under n CPU threads

Read more

Summary

Introduction

Identifying sequences (in a target database) having statistically significant local alignments with a given query is routine in computational biology. BLAST builds a lookup table for the query, and scans the database for seeds, which are heuristic points for significant local alignments. These seeds are extended to longer ungapped alignments and to gapped alignments. Searching homologous sequences in a target database is a bottleneck in bioinformatics due to the exponential growth in the number of biological sequences [3]. Many methods were proposed to address this issue. They can be divided into two categories: hardware acceleration and improved indexing

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.