Abstract

Given a set of strings S={s1 ,s2 . . . ,sn} over a finite alphabet $\Sigma$, a superstring of S is a string that contains each si as a contiguous substring. The shortest superstring (SS) problem is to find a superstring of minimum length. This problem has important applications in computational biology and in data compression (see, respectively, [A. Lesk, ed., Computational Molecular Biology, Sources and Methods for Sequence Analysis, Oxford University Press, Oxford, 1988]; [J. Storer, Data Compression: Methods and Theory, Computer Science Press, Rockville, MD, 1988]). SS is MAX SNP-hard [A. Blum et al., Proc. 23rd Annual ACM Symposium on Theory of Computing, ACM, New York, 1991, pp. 328--336] so it is unlikely that the length of a shortest superstring can be approximated to within an arbitrary constant. Several heuristics have been suggested and it is conjectured that GREEDY achieves an approximation factor of 2. This, unfortunately, remains an open question. Several linear approximation algorithms for SS have been proposed. The first, by Blum et al. [ Proc. 23rd Annual ACM Symposium on Theory of Computing, ACM, New York, 1991, pp. 328--336], guarantees a performance factor of 3. The factor has been successively improved to $2\frac{8}{9}$, $2 \frac{5}{6}$, $2 \frac{50}{63}$, $2 \frac{3}{4}$, $2\frac{2}{3}$, and $2.596$ (see, respectively, [S. Teng and F. Yao, Proc. 34th Annual IEEE Symposium on Foundations of Computer Science, IEEE Computer Society Press, Piscataway, NJ, 1993, pp. 158--165]; [A. Czumaj et al., Proc. First Scandinavian Workshop on Algorithm Theory, Lecture Notes in Comput. Sci. 824, Springer-Verlag, Berlin, 1994, pp. 95--106]; [R. Kosaraju, J. Park, and C. Stein, Proc. 35th Annual IEEE Symposium on Foundations of Computer Science, IEEE Computer Society Press, Piscataway, NJ, 1994, pp. 166--177]; [C. Armen and C. Stein, Proc. 5th Internat. Workshop on Algorithms and Data Structures, Lecture Notes in Comput. Sci. 955, Springer-Verlag, Berlin, 1995, pp. 494--505]; [C. Armen and C. Stein, Proc. Combinatorial Pattern Matching, Lecture Notes in Comput. Sci. 1075, Springer-Verlag, Berlin, 1996, pp. 87--101]; and [D. Breslauer, T. Jiang, and Z. Jiang, J. Algorithms, 24 (1997), pp. 340--353]). In this paper we give an algorithm that guarantees a $2\frac{1}{2}$-approximation factor.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call