Abstract

Sequence alignment is a fundamental problem in computational biology, which is also important in theoretical computer science. In this paper, we consider the problem of aligning a set of sequences subject to a given constrained sequence. Given two sequences $$A=a_1a_2\ldots a_n$$A=a1a2źan and $$B=b_1b_2\ldots b_n$$B=b1b2źbn with a given distance function and a constrained sequence $$C=c_1c_2\ldots c_k$$C=c1c2źck, our goal is to find the optimal sequence alignment of A and B w.r.t. the constraint C. We investigate several variants of this problem. If $$C=c^k$$C=ck, i.e., all characters in C are same, the optimal constrained pairwise sequence alignment can be solved in $$O(\min \{kn^2,(t-k)n^2\})$$O(min{kn2,(t-k)n2}) time, where t is the minimum number of occurrences of character c in A and B. If in the final alignment, the alignment score between any two consecutive constrained characters is upper bounded by some value, which is called GB-CPSA, we give a dynamic programming with the time complexity $$O(kn^4/\log n)$$O(kn4/logn). For the constrained center-star sequence alignment (CCSA), we prove that it is NP-hard to achieve the optimal alignment even over the binary alphabet. Furthermore, we show a negative result for CCSA, i.e., there is no polynomial-time algorithm to approximate the CCSA within any constant ratio.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.