Abstract
The shortest superstring problem for a given set of strings is to find a string of minimum length such that each input string is a substring of the resulting string. This problem is known to be NP-complete. A simple and popular approximation algorithm for this problem is GREEDY, which at each step merges a pair of strings that have maximum overlap. If more than one pair have maximum overlap, it takes a pair in random. In this paper, we modify GREEDY such that instead of taking a pair in random, it takes the pair for which the overlap in the next step is maximum. We analyze our algorithm and compare it with GREEDY for different types of input. We implement both the algorithms and the experimental results show that our algorithm can outperform GREEDY substantially in many cases, and in general our algorithm is same or better than GREEDY.
Highlights
Given a set of strings S, the shortest superstring problem (SSP) for S is to find a string s of minimum length such that each string in S is a substring of s
It is natural that the running time of our algorithm will be higher than that of the Greedy Algorithm, because our algorithm looks one step further to decide which strings to take in the current step
As the input strings become more random, our algorithm and the Greedy Algorithm get closer in performance—with most of the time our algorithm is better than the Greedy Algorithm and the rest of the time, same
Summary
Given a set of strings S, the shortest (common) superstring problem (SSP) for S is to find a string s of minimum length such that each string in S is a substring of s. The shortest possible superstring s is TACGTAG of length seven. The string TAC is a substring of s at the front, TAG is a substring at the end of s, and CGTA is a substring inside of s. There are two C in S, but s contains only one. This is the maximum compression possible in s. CGTACTAG is another superstring of all strings in S, but it is not the shortest, as it has length eight
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have