Abstract

Regularities in strings arise in various areas of science, including coding and automata theory, formal language theory, combinatorics, molecular biology and many others. A common notion to describe regularity in a string T is a cover, which is a string C for which every letter of T lies within some occurrence of C. The alignment of the cover repetitions in the given text is called a tiling. In many applications finding exact repetitions is not sufficient, due to the presence of errors. In this paper, we use a new approach for handling errors in coverable phenomena and define the approximate cover problem (ACP), in which we are given a text that is a sequence of some cover repetitions with possible mismatch errors, and we seek a string that covers the text with the minimum number of errors. We first show that the ACP is NP-hard, by studying the cover-length relaxation of the ACP, in which the requested length of the approximate cover is also given with the input string. We show that this relaxation is already NP-hard. We also study another two relaxations of the ACP, which we call the partial-tiling relaxation of the ACP and the full-tiling relaxation of the ACP, in which a tiling of the requested cover is also given with the input string. A given full tiling retains all the occurrences of the cover before the errors, while in a partial tiling there can be additional occurrences of the cover that are not marked by the tiling. We show that the partial-tiling relaxation has a polynomial time complexity and give experimental evidence that the full-tiling also has polynomial time complexity. The study of these relaxations, besides shedding another light on the complexity of the ACP, also involves a deep understanding of the properties of covers, yielding some key lemmas and observations that may be helpful for a future study of regularities in the presence of errors.

Highlights

  • Regularities in strings arise in various areas of science, including coding and automata theory, formal language theory, combinatorics, molecular biology and many others

  • In this paper we extend the approach of [1] to the notion of covers and define the approximate cover problem (ACP), in which we are given a text that is a sequence of some cover repetitions with possible mismatch errors, and we seek a string that covers the text with the minimum

  • We prove that the ACP is N P-hard by studying a relaxation of this problem, which we call the cover-size relaxation of the ACP

Read more

Summary

Introduction

Regularities in strings arise in various areas of science, including coding and automata theory, formal language theory, combinatorics, molecular biology and many others. It is desirable to broaden the definition of periodicity and study wider classes of repetitive patterns in strings One common such notion is that of a cover, defined as follows. While covers are a significant generalization of the notion of periods as formalizing regularities in strings, they are still restrictive, in the sense that it remains unlikely that an arbitrary string has a cover shorter than the word itself Due to this reason, different variants of quasi-periodicity have been introduced. A (smallest) repeat generating a string with the minimum total number of mismatches with the input string is sought Extension of this definition approach to approximate covers is the topic of this paper

Our Results
Preliminaries
N P-Hardness of the ACP
The Reduction from 3-SAT
The Partial-Tiling Relaxation of the ACP
The Histogram Greedy Algorithm
The Partial-Tiling Primitivity Coercion Algorithm
The Full-Tiling Relaxation of the ACP
The Full-Tiling Primitivity Coercion Algorithm
Experimental Tests of the Full-Tiling Relaxation Algorithm
Open Problems
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.