Relationship between superstring and compression measures: New insights on the greedy conjecture

Bastien Cazaux,Eric Rivals

doi:10.1016/j.dam.2017.04.017

Abstract

A superstring of a set of words is a string that contains each input word as a substring. Given such a set, the Shortest Superstring Problem (SSP) asks for a superstring of minimum length. SSP is an important theoretical problem related to the Asymmetric Travelling Salesman Problem, and also has practical applications in data compression and in bioinformatics. Indeed, it models the question of assembling a genome from a set of sequencing reads. Unfortunately, SSP is known to be NP-hard even on a binary alphabet and also hard to approximate with respect to the superstring length or to the compression achieved by the superstring. Even the variant in which all words share the same length r, called r-SSP, is NP-hard whenever r>2. Numerous involved approximation algorithms achieve approximation ratio above 2 for the superstring, but remain difficult to implement in practice. In contrast the greedy conjecture asked in 1988 whether a simple greedy algorithm achieves ratio of 2 for SSP. Here, we present a novel approach to bound the superstring approximation ratio with the compression ratio, which, when applied to the greedy algorithm, shows a 2 approximation ratio for 3-SSP, and also that greedy achieves ratios smaller than 2. This leads to a new version of the greedy conjecture.

Highlights

Given a set of p words P := {s1, s2, . . . , sp} over a finite alphabet Σ, a superstring of P is a string containing each si for 1 ≤ i ≤ p as a substring
The Shortest Superstring Problem is a crucial problem in computer science and has many practical applications in data compression, and in bioinformatics where it models genome assembly [8]
We exploit the relationship between the two approximation measures, the superstring length and the compression, to bound the superstring ratio in function of the compression ratio, which to our knowledge is new

Summary

Introduction

Given a set of p words P := {s1, s2, . . . , sp} over a finite alphabet Σ, a superstring of P is a string containing each si for 1 ≤ i ≤ p as a substring. It has been proven that in the case where all input words have length 4 (for 4-SSP) the greedy algorithm achieves a superstring ratio of at most 2, as stated by the conjecture [11]. This proof is valid only for words of length 4 and cannot be adapted to words of length 3, for instance. We get a tight superstring ratio of 3/2 for 2-SSP, thereby demonstrating that the greedy algorithm can achieve a ratio strictly smaller than 2 This shows first that the general relationship between the superstring and compression measures is important and can serve for future research. By definition A achieves the compression ratio comp(A), so using the previous inequality we get comp(A) × (∥P∥ − |sopt (P)|)

Approximation of r-SSP

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Discrete Applied Mathematics	Publication Date: May 19, 2017
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

Relationship between superstring and compression measures: New insights on the greedy conjecture

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Discrete Applied Mathematics

Lead the way for us

Similar Papers

Explicit Inapproximability Bounds for the Shortest Superstring Problem
Virginia Vassilevska
-
Virginia VassilevskaVirginia Vassilevska
01 Jan 2004
01 Jan 2004

Parallel and sequential approximation of shortest superstrings
Artur Czumaj ... Marek Piotrów
-
Artur Czumaj, et. al.Artur Czumaj ... Marek Piotrów
01 Jan 1993
01 Jan 1993

Improved approximation guarantees for shortest superstrings using cycle classification by overlap to length ratios
Matthias Englert ... Pavel Veselý
-
Matthias Englert, et. al.Matthias Englert ... Pavel Veselý
09 Jun 2022
09 Jun 2022

The Shortest Superstring Problem
Theodoros P. Gevezes ... Leonidas S. Pitsoulis
-
Theodoros P. Gevezes, et. al.Theodoros P. Gevezes ... Leonidas S. Pitsoulis
01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Relationship between superstring and compression measures: New insights on the greedy conjecture

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Discrete Applied Mathematics