Abstract
Given an input sequence of data, a pattern is a repeating sequence, possibly interspersed with dont care characters. In practice, the patterns or motifs of interest are the ones that also allow a variable number of gaps (or dont care characters): we call these the flexible motifs. The number of rigid motifs could potentially be exponential in the size of the input sequence and in the case where the input is a sequence of real numbers, there could be uncountably infinite number of motifs (assuming two real numbers are equal if they are within some δ > 0 of each other). It has been shown earlier that by suitably defining the notion of maximality and redundancy, there exists only a linear (or no more than 3n) number of irredundant motifs and a polynomial time algorithm to detect these irredundant motifs. Here we present a uniform framework that encompasses both rigid and flexible motifs with generalizations to sequence of sets and real numbers and show a somewhat surprising result that the number of irredundant flexible motifs still have a linear bound. However, the algorithm to detect them has a higher complexity than that of the rigid motifs.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have