Abstract
Sequence alignment is a long standing problem in bioinformatics. The Basic Local Alignment Search Tool (BLAST) is one of the most popular and fundamental alignment tools. The explosive growth of biological sequences calls for speedup of sequence alignment tools such as BLAST. To this end, we develop high speed BLASTN (HS-BLASTN), a parallel and fast nucleotide database search tool that accelerates MegaBLAST—the default module of NCBI-BLASTN. HS-BLASTN builds a new lookup table using the FMD-index of the database and employs an accurate and effective seeding method to find short stretches of identities (called seeds) between the query and the database. HS-BLASTN produces the same alignment results as MegaBLAST and its computational speed is much faster than MegaBLAST. Specifically, our experiments conducted on a 12-core server show that HS-BLASTN can be 22 times faster than MegaBLAST and exhibits better parallel performance than MegaBLAST. HS-BLASTN is written in C++ and the related source code is available at https://github.com/chenying2016/queries under the GPLv3 license.
Highlights
Identifying sequences having statistically significant local alignments with a given query is routine in computational biology
We compare the performance of HSBLASTN with that of MegaBLAST on each query set under different numbers of CPU threads
T M(q, n) T H(q, n) as the relative speedup achieved by HS-BLASTN in comparison to MegaBLAST when both alignment tools running on query set q under n CPU threads
Summary
Identifying sequences (in a target database) having statistically significant local alignments with a given query is routine in computational biology. BLAST builds a lookup table for the query, and scans the database for seeds, which are heuristic points for significant local alignments. These seeds are extended to longer ungapped alignments and to gapped alignments. Searching homologous sequences in a target database is a bottleneck in bioinformatics due to the exponential growth in the number of biological sequences [3]. Many methods were proposed to address this issue. They can be divided into two categories: hardware acceleration and improved indexing
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.