Abstract

Computing a similarity measure of a given set of molecular sequences is an important task in bioinformatics studies. Weighted sequences have become an interesting research area since they allow a newer and more precise encoding paradigm of molecular structures. The longest common subsequence (LCS) has been an extensively studied technique to compute similarity on sequences represented as strings and it has been used in many applications. There is a current trend to generalize those algorithms to work on weighted sequences too. The resulting variant of the problem is called the weighted LCS. In this paper, we study the problem of finding the weighted LCS of two weighted sequences. Particularly, a novel approach is presented to tackle the weighted LCS for a bounded molecular alphabet constrained by one or two α parameters. Based on the dominant-match-point paradigm, we model the problem using a multiobjective optimization approach. As a result, we propose a novel, efficient and exact algorithm that not only finds the weighted LCS but also the set of all possible solutions. We perform experimental analysis using simulated and real data to compare the performance of the proposed approach. The experiments show that the proposed algorithm has a good performance in small instances of both benchmarks. Furthermore, it can be used on a great number of bioinformatics applications where the computation of similarity between short sequence fragments is needed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call