Abstract
Finding shortest common supersequences (SCS) and longest common subsequences (LCS) for a given set of sequences are two well-known NP-hard problems. They have important applications in many areas including computational molecular biology (e.g., sequence alignment), data compression, planning, text editing (e.g., diff function in UNIX), etc. [1, 6, 7, 8, 10, 17, 19, 22, 23, 24, 26, 27]. The question of approximating SCS and LCS was raised 15 years ago in [19]. A lot of fruitless effort has been spent in searching for such approximation algorithms.We will attack the question by proving: (i) SCS does not have a polynomial-time linear approximation algorithm, unless P = NP; (ii) There exists a constant δ>0 such that, if SCS has a polynomial-time approximation algorithm with ratio log δ n, where n is the number of input sequences, then NP is contained in DTIME(2polylog n); (iii) There exists a constant δ>0 such that, if LCS has a polynomial-time approximation algorithm with performance ratio Ŋ δ , then P = NP. Item (iii) is straightforward using recent breakthrough results in [3]. However, items (i) and (ii) require new ideas and techniques.In the second part of the paper, we introduce a new powerful method for analyzing average performance of algorithms. Despite of our non-approximability results (for the worst case), we show that there is a simple greedy algorithm which produces a common supersequence (or a common subsequence) of length OPT + O(OPT0.707) (or OPT − O(OPT1/2+ε) for any ε>0, resp.), on the average, where OPT denotes the optimal length.Incidentally, our analysis also provides tight upper and lower bounds on the expected LCS and SCS length for n random sequences, solving a generalization of another well-known open question on the expected LCS length for two random sequences [2, 5, 22].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.