Abstract

The Longest Common Subsequence (LCS) problem aims at finding a longest string that is a subsequence of each string from a given set of input strings. This problem has applications, in particular, in the context of bioinformatics, where strings represent DNA or protein sequences. Existing approaches include numerous heuristics, but only a few exact approaches, limited to rather small problem instances. Adopting various aspects from leading heuristics for the LCS, we first propose an exact A∗ search approach, which performs well in comparison to earlier exact approaches in the context of small instances. On the basis of A∗ search we then develop two hybrid A∗–based algorithms in which classical A∗ iterations are alternated with beam search and anytime column search, respectively. A key feature to guide the heuristic search in these approaches is the usage of an approximate expected length calculation for the LCS of uniform random strings. Even for large problem instances these anytime A∗ variants yield reasonable solutions early during the search and improve on them over time. Moreover, they terminate with proven optimality if enough time and memory is given. Furthermore, they yield upper bounds and, thus, quality guarantees when terminated early. We comprehensively evaluate the proposed methods using most of the available benchmark sets from the literature and compare to the current state-of-the-art methods. In particular, our algorithms are able to obtain new best results for 82 out of 117 instance groups. Moreover, in most cases they also provide significantly smaller optimality gaps than other anytime algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.