Abstract

Given a set of n strings of length L and a radius d, the closest string problem (CSP for short) asks for a string t sol that is within a Hamming distance of d to each of the given strings. It is known that the problem is NP-hard and its optimization version admits a polynomial time approximation scheme (PTAS). Parameterized algorithms have been then developed to solve the problem when d is small. In this paper, with a new approach (called the 3- string approach), we first design a parameterized algorithm for binary strings that runs in O ( n L + n d 3 ⋅ 6.731 d ) time, while the previous best runs in O ( n L + n d ⋅ 8 d ) time. We then extend the algorithm to arbitrary alphabet sizes, obtaining an algorithm that runs in time O ( n L + n d ⋅ ( 1.612 ( | Σ | + β 2 + β − 2 ) ) d ) , where | Σ | is the alphabet size and β = α 2 + 1 − 2 α − 1 + α − 2 with α = | Σ | − 1 + 1 3 . This new time bound is better than the previous best for small alphabets, including the very important case where | Σ | = 4 (i.e., the case of DNA strings).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call