Abstract

Digital trees, also known as $\textit{"tries''}$, are fundamental to a number of algorithmic schemes, including radix-based searching and sorting, lossless text compression, dynamic hashing algorithms, communication protocols of the tree or stack type, distributed leader election, and so on. This extended abstract develops the asymptotic form of expectations of the main parameters of interest, such as tree size and path length. The analysis is conducted under the simplest of all probabilistic models; namely, the $\textit{memoryless source}$, under which letters that data items are comprised of are drawn independently from a fixed (finite) probability distribution. The precise asymptotic structure of the parameters' expectations is shown to depend on fine singular properties in the complex plane of a ubiquitous $\textit{Dirichlet series}$. Consequences include the characterization of a broad range of asymptotic regimes for error terms associated with trie parameters, as well as a classification that depends on specific $\textit{arithmetic properties}$, especially irrationality measures, of the sources under consideration.

Highlights

  • Known as “tries”, serve to represent finite collections of words over some finite alphabet: each subtree stemming directly from the root is associated with the subcollection of words starting with a given letter; each subtree at level two corresponds to a given prefix of length two, and so on

  • As noted early [7, 8, 18, 36], quantifying the main parameters of the digital tree is strongly dependent upon the location of poles in the complex plane of the fundamental

  • The paper by Fayolle et al [8] seems to have been the first to conduct a detailed discussion of the geometry of poles and related integration contours, with the “periodicity criterion” explicitly enunciated. As it was recognized in subsequent years, largely by Jacquet, Louchard, and Szpankowski, digital tree analyses can serve as the basis of a remarkably precise understanding of the Lempel and Ziv schemes for data compression

Read more

Summary

Introduction

Known as “tries”, serve to represent finite collections of words over some finite alphabet: each subtree stemming directly from the root is associated with the subcollection of words starting with a given letter; each subtree at level two corresponds to a given prefix of length two, and so on. The paper by Fayolle et al [8] seems to have been the first to conduct (in the binary case) a detailed discussion of the geometry of poles and related integration contours, with the “periodicity criterion” explicitly enunciated (cf Theorem 1). As it was recognized in subsequent years, largely by Jacquet, Louchard, and Szpankowski (see, e.g., [17, 23]), digital tree analyses can serve as the basis of a remarkably precise understanding of the Lempel and Ziv schemes for data compression. Similar comments apply to renewal theory and dynamical systems theory, where the periodicity–aperiodicity dichotomy (Section 2) plays a role: we refer to the works of Pollicott [29, p. 143], as well as Baladi, Cesaratto, and Vallee [3, 4, 6, 38] for a dynamical discussion

Statement of the main result
Ladders and poles
Proof of Theorem 2
Error bounds
Rational probabilities and metric aspects
Asymptotic analysis of tries
Invariance of the irrationality exponent
B Numerical aspects
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call