Non-similarity combinatorial problems

Anatoly R Rubinov,Vadim G Timkovsky

doi:10.1016/0303-2647(93)90064-j

Abstract

Similarity problems intensively investigated in computational molecular biology have the following two stringology models: find the longest string included in any string of a given finite language, and find the shortest string including every string of a given finite language. These two problems are exemplified by the two well-known pairs of problems, the longest common subsequence (or substring) problem and the shortest common supersequence (or superstring) problem. interpretations. In this paper we consider opposite problems connected with string non-inclusion relations: find the shortest string included in no string of a given finite language and find the longest string including no string of a given finite language. The predicate “string α is not included in string β” is interpreted either as “α is not a subsequence of β” or as “α is not a substring of β”. The main purpose is to determine the complexity status of the non-similarity problems. Using graph approaches, we present NP-hardness proofs for the first interpretation and polynomial-time algorithms for the second one. Special cases of the problems, and related issues are discussed.

Full Text