We introduce the novel Nearest Pattern Constrained String (NPCS) problem of finding a minimum set Q of character mutation, insertion, and deletion edit operations sufficient to modify a string x to contain all contiguous substrings in a pattern set P and no contiguous substrings in a forbidden pattern set F. Letting Σ be the alphabet of allowed characters, and letting η and ϒ be the longest string length and sum of all string lengths in P∪F, respectively, we show that NPCS is fixed-parameter tractable in |P| with time complexity O(2|P|⋅ϒ⋅|Σ|⋅(|P|+η)(|x|+1)). Additionally, we consider a generalization of the NPCS problem in which we allow for constraints based on the membership of substrings in regular languages. In particular, we introduce a problem we denote String Editing under Substring in Language Constraints (StrEdit-SILC), where provided a wildcard-free string x∈Σ⁎, a finite set of regular languages R={L1,L2,…}, and a regular language LF, the objective is to find a minimum cost set of mutation, insertion, and deletion edit operations Q that suffice to convert the input string x into a string x′∈Σ⁎, where no substring has membership in LF, and ∀Li∈R, there exists a substring in Li. Here, letting Ψ and ϖ be the sum of all regular expression lengths and longest regular expression length for languages in R∪{LF}, respectively, and letting Cmid∈N be the maximum cost of an edit operation, we show that StrEdit-SILC is fixed-parameter tractable with respect to Ψ, having time complexity O(2Ψ⋅|x|⋅(ϖ⋅|Σ|+Cmid)). However, we also show that StrEdit-SILC is MAX-SNP-hard and otherwise difficult to approximate under stringent constraints.
Read full abstract