Abstract

Biological sequence comparison is one of the most important and basic problems in computational biology. Due to its high demands for computational power and memory, it is a very challenging task. The well-known algorithm proposed by Smith-Waterman obtains the best local alignments at the expense of very high computing power and huge memory requirements. This paper introduces a new efficient algorithm to locate the longest common subsequences (LCS) in two different DNA sequences. It is based on the convolution between the two DNA sequences: The major sequence is represented in the linked-list X while the minor one is represented in circular linked-list Y. An array of linked lists is established where each linked list is corresponding to an element of the linked-list X and a new node is added to it for each match between the two sequences. If two or more matches in different locations in string Y share the same location in string X, the corresponding nodes will construct a unique linked-list. Accordingly, by the end of processing, we obtain a group of linked-lists containing nodes that reflect all possible matches between the two sequences X and Y. The proposed algorithm has been implemented and tested using C# language. The benchmark test shows very good speedups and indicated that impressive improvements has been achieved.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call