Abstract

In this paper we consider sets of factors of a given finite word over a finite alphabet which permit us to reconstruct the entire word. This analysis is based on the notion of special factor. A factor u of a finite word w is called right (resp. left) special if there exist two distinct letters x and y such that ux, uy (resp. xu, yu) are factors of w. A factor is bispecial if it is right and left special. A proper box of w is any factor of w of the kind asb, with a,b letters and s a bispecial factor of w. The initial (resp. terminal) box of w is the shortest prefix (resp. suffix) of w which is an unrepeated factor. A box is called maximal if it is not a proper factor of another box. The main result of the paper is the following theorem (maximal box theorem): Any finite word w is uniquely determined by the initial box, the terminal box and the set of maximal boxes. A consequence is that a finite word w is uniquely determined by the knowledge of its factors up to the length n=max{Rw,Kw}+1, where Kw is the length of the terminal box and Rw is the minimal natural number for which there is no right special factor of length Rw. Some structural properties of boxes are studied. Another important combinatorial notion is that of superbox. A superbox is any factor of w of the kind asb, with a,b letters and such that s is a repeated factor, whereas as and sb are unrepeated factors. A theorem for superboxes similar to the maximal box theorem is proved. Some algorithms allowing us to construct boxes and superboxes and, conversely, to reconstruct the word are given. In this combinatorial frame we give an upper and a lower bound to the number of states of a minimal deterministic automaton recognizing the set of the factors of w. These bounds are sharper than the known bounds.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call