Abstract

An α-gapped repeat (α ≥ 1) in a word w is a factor uvu of w such that |uv| ≤ α|u|; the two occurrences of u are called arms of this α-gapped repeat. An α-gapped repeat is called maximal if its arms cannot be extended simultaneously with the same character to the right nor to the left. We show that the number of all maximal α-gapped repeats occurring in words of length n is upper bounded by 18αn. In the case of α-gapped palindromes, i.e., factors uv{{u}^{intercal }} with |uv|≤ α|u|, we show that the number of all maximal α-gapped palindromes occurring in words of length n is upper bounded by 28αn + 7n. Both upper bounds allow us to construct algorithms finding all maximal α-gapped repeats and/or all maximal α-gapped palindromes of a word of length n on an integer alphabet of size n^{mathcal {O}(1)} in {mathcal {O}(alpha n)} time. The presented running times are optimal since there are words that have Θ(αn) maximal α-gapped repeats/palindromes.

Highlights

  • Gapped repeats and palindromes are repetitive structures occurring in words that were investigated extensively within theoretical computer science with motivation coming especially from the analysis of DNA and RNA structures, modelling different types of tandem and interspersed repeats as well as hairpin structures; such structures are important in analyzing the structural and functional information of the genetic sequences.Let w denote the reversed word of a word w

  • Theory Comput Syst (2018) 62:162–191 case of α-gapped palindromes, i.e., factors uvu with |uv| ≤ α|u|, we show that the number of all maximal α-gapped palindromes occurring in words of length n is upper bounded by 28αn + 7n

  • In [19], Kolpakov et al introduced the notion of α-gapped repeats, and showed that the set Gα(w) of all maximal α-gapped repeats can be computed in O(α2n+|Gα(w)|) time for integer alphabets

Read more

Summary

Introduction

Gapped repeats and palindromes are repetitive structures occurring in words that were investigated extensively within theoretical computer science (see, e.g., [3, 5,6,7,8, 10, 14, 17,18,19, 22] and the references therein) with motivation coming especially from the analysis of DNA and RNA structures, modelling different types of tandem and interspersed repeats as well as hairpin structures; such structures are important in analyzing the structural and functional information of the genetic sequences (see, e.g., [3, 14, 18]). Kolpakov and Kucherov [18] introduced the notion of long-armed palindromes (equivalently, 2-gapped palindromes), and showed how to compute the set G2 (w) of all maximal 2-gapped palindromes in O(n+|G2 (w)|) time for an input word w of length n over a constant alphabet. They left the question open how large G2 (w) can be. This problem was recently investigated in [1]

Combinatorics on Words
Point Analysis
Upper Bound on the Number of Maximal α-gapped Repeats
Upper Bound on the Number of Maximal α-gapped Palindromes
Finding All Maximal α-gapped Repeats
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call