Partial fillup and search time in LC tries

Svante Janson,Wojciech Szpankowski

doi:10.1145/1290672.1290681

Abstract

Andersson and Nilsson introduced in 1993 a level-compressed trie (for short, LC trie) in which a full subtree of a node is compressed to a single node of degree being the size of the subtree. Recent experimental results indicated a “dramatic improvement” when full subtrees are replaced by “partially filled subtrees.” In this article, we provide a theoretical justification of these experimental results, showing, among others, a rather moderate improvement in search time over the original LC tries. For such an analysis, we assume that n strings are generated independently by a binary memoryless source, with p denoting the probability of emitting a “1” (and q = 1 − p ). We first prove that the so-called α-fillup level F n (α) (i.e., the largest level in a trie with α fraction of nodes present at this level) is concentrated on two values with high probability: either F n (α) = k n or F n (α) = k n + 1, where k n = log 1/√ pq n − |ln ( p/q )|/2 ln 3/2 (1√ pq ) Φ −1 (α) √ ln n + O (1) is an integer and Φ( x ) denotes the normal distribution function. This result directly yields the typical depth (search time) D n (α) in the α-LC tries, namely, we show that with high probability D n (α) ∼ C 2 log log n , where C 2 = 1/|log(1 − h /log(1/√ pq ))| for p ≠ q and h = − p log p − q log q is the Shannon entropy rate. This should be compared with recently found typical depth in the original LC tries, which is C 1 log log n , where C 1 = 1/|log(1− h /log(1/min{ p , 1− p }))|. In conclusion, we observe that α affects only the lower term of the α-fillup level F n (α), and the search time in α-LC tries is of the same order as in the original LC tries.

Full Text