String Noninclusion Optimization Problems

Anatoly R Rubinov,Vadim G Timkovsky

doi:10.1137/s0895480192234277

Abstract

For every inclusion relation there are two optimization problems: find a longest included in every of a given finite language, and find a shortest including every of a given finite language. As an example, the two well-known pairs of problems, the longest common substring (or subsequence) problem and the shortest common superstring (or supersequence) problem, are interpretations of these two problems. In this paper we consider a class of opposite problems connected with noninclusion relations: find a shortest included in no of a given finite language and find a longest including no of a given finite language. The predicate string is not included in $\beta$ is interpreted as either $\alpha$ is not a substring of $\beta$ or $\alpha$ is not a subsequence of $\beta$. The main purpose is to determine the complexity status of the noninclusion optimization problems. Using graph approaches we present polynomial-time algorithms for the first interpretation and NP-hardness proofs for the second. We also discuss restricted versions of the problems, correlations between the inclusion and noninclusion problems, and generalized problems which are the inclusion problems for one language and the noninclusion problems for another. In applications the inclusion problems are used to find a similarity between any structures which can be represented by strings. Respectively, the noninclusion problems can be used to find a nonsimilarity. Such problems occur in computational molecular biology, data compression, pattern recognition, and flexible manufacturing. The above generalized problems arise naturally in all of these applied areas. Apart from this practical reason, we hope that studying the noninclusion problems will yield deeper understanding of the inclusion problems.

Full Text