Abstract

Let S be a set of k strings over an alphabet Σ; each string has a length between ℓ and n. The Closest Substring Problem (CSSP) is to find a minimal integer d (and a corresponding string t of length ℓ) such that each string s ∊ S has a substring of length ℓ with Hamming distance at most d to t. We say t is the closest substring to S. For ℓ = n, this problem is known as the Closest String Problem (CSP). Particularly in computational biology, the CSP and CSSP have found numerous practical applications such as identifying regulatory motifs and approximate gene clusters, and in degenerate primer design. We study ILP formulations for both problems. Our experiments show that a position-based formulation for the CSP performs very well on real-world instances emerging from biology. Even on randomly generated instances that are hard to solve to optimality, solving the root relaxation leads to solutions very close to the optimum. For the CSSP we give a new formulation that is polytope-wise stronger than a straightforward extension of the CSP formulation. Furthermore we propose a strengthening constraint class that speeds up the running time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call