A set of words, also called a language, is letter-balanced if the number of occurrences of each letter only depends on the length of the word, up to a constant. Similarly, a language is factor-balanced if the difference of the number of occurrences of any given factor in words of the same length is bounded. The most prominent example of a letter-balanced but not factor-balanced language is given by the Thue–Morse sequence. We establish connections between the two notions, in particular for languages given by substitutions and, more generally, by sequences of substitutions. We show that the two notions essentially coincide when the sequence of substitutions is proper. For the class of Thue–Morse–Sturmian languages, we give a full characterisation of factor-balancedness.
Read full abstract