Abstract

DNA sequence search is a very important topic in bioinformatics algorithm development. However, this task usually spends much computational time to search on large DNA sequence database. In this paper, we propose an efficient hierarchical DNA sequence search algorithm to improve the search speed while the accuracy is being kept constant. For a given query DNA sequence, firstly, a fast local search algorithm using histogram features is used as a filtering mechanism before scanning the sequences in the database. An overlapping processing is newly added to improve the robustness of the algorithm. A large number of DNA sequences with low similarity will be excluded for latter searching. The Smith-Waterman algorithm is then applied to each remainder sequences. Experimental results using GenBank se- quence data show the proposed algorithm combining histogram information and Smith-Waterman algorithm is more efficient for DNA se- quence search. Keywords- Fast search, DNA sequence, Histogram feature, Smith-Waterman algorithm, Local search.

Highlights

  • The decipherment of 3-billion-base human genome sequence which was called Apollo project of life sciences [1, 2] was completed by the international cooperation in April 2003

  • Experimental results We select 50 results with highest scores among the whole results of the entire DNA sequences which given by the Smith-Waterman algorithm [6] and perform the same search by using histogram information algorithm and calculating the recall and the precision

  • In this paper, we proposed an improved local search algorithm that improves both the speed and the precision of search by combining histogram features and Smith-Waterman dynamic programming algorithms in the fast search of DNA sequences

Read more

Summary

Introduction

The decipherment of 3-billion-base human genome sequence which was called Apollo project of life sciences [1, 2] was completed by the international cooperation in April 2003. A Fast Local Search Algorithm Using Histogram Features for DNA Sequence Database We have proposed an efficient algorithm combining histogram features and Smith-Waterman dynamic programming algorithms [6] in order to improve both speed and precision [13].

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call