Abstract

Binary unlabeled ordered trees (further called binary trees) were studied at least since Euler, who enumerated them. The number of such trees with n nodes is now known as the Catalan number. Over the years various interesting questions about the statistics of such trees were investigated (e.g., height and path length distributions for a randomly selected tree). Binary trees find an abundance of applications in computer science. However, recently Seroussi posed a new and interesting problem motivated by information theory considerations: how many binary trees of a \emphgiven path length (sum of depths) are there? This question arose in the study of \emphuniversal types of sequences. Two sequences of length p have the same universal type if they generate the same set of phrases in the incremental parsing of the Lempel-Ziv'78 scheme since one proves that such sequences converge to the same empirical distribution. It turns out that the number of distinct types of sequences of length p corresponds to the number of binary (unlabeled and ordered) trees, T_p, of given path length p (and also the number of distinct Lempel-Ziv'78 parsings of length p sequences). We first show that the number of binary trees with given path length p is asymptotically equal to T_p ~ 2^2p/(log_2 p)(1+O(log ^-2/3 p)). Then we establish various limiting distributions for the number of nodes (number of phrases in the Lempel-Ziv'78 scheme) when a tree is selected randomly among all trees of given path length p. Throughout, we use methods of analytic algorithmics such as generating functions and complex asymptotics, as well as methods of applied mathematics such as the WKB method and matched asymptotics.

Highlights

  • HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not

  • Recently Seroussi posed a new and interesting problem motivated by information theory considerations: how many binary trees of a given path length are there? This question arose in the study of universal types of sequences

  • Two sequences of length p have the same universal type if they generate the same set of phrases in the incremental parsing of the Lempel-Ziv’78 scheme since one proves that such sequences converge to the same empirical distribution

Read more

Summary

Summary of Results

We let b(n, p) denote the number of binary trees with n nodes and path length p. This function satisfies the recurrence relation b(n, p) =. We shall mostly analyze (2.1), and obtain asymptotic results for b(n, p) by expanding the Cauchy integral (cf [24]). Where →d denotes convergence in distribution and A is the random variable possessing the Airy distribution. It is characterized by moments [7]. It is a random variable distributed as We shall compute this distribution asymptotically, and obtain the asymptotic structure of b(n, p) for various ranges of n and p. We formulate our main result concerning the cardinality of Tp

Result
The following matching asymptotics hold:
Far Right Region
Right Region
Central Region
Moment equations
Analysis of the basic recurrence
Transform inversion
Left Region
Far Left Region
The Matching Region Between the Left and Far Left Scales
A0 a Q
Numerical Studies
Findings
A APPENDIX
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call