The definition of $\text {k}^{th}$ -order empirical entropy of strings is extended to node-labelled binary trees: A notion of $\text {k}^{th}$ -order empirical entropy for node-labelled binary trees is proposed that is able to capture regularities in both labels and structure of a tree. A suitable binary encoding of tree straight-line programs (that have been used for grammar-based tree compression before) is shown to yield binary tree encodings of size bounded by the $\text {k}^{th}$ -order empirical entropy plus some lower order terms. This result is then extended from node-labelled binary trees to node-labelled unranked trees. This generalizes recent results for grammar-based string compression to grammar-based tree compression. Additionally, experimental results with real XML document trees are presented, in which the proposed notion of $\text {k}^{th}$ -order empirical tree entropy is computed and compared to the performance of grammar-based tree compressors for those XML document trees.
Read full abstract