Assembling approximately optimal binary search trees efficiently using arithmetics

Jussi Kujala

doi:10.1016/j.ipl.2008.08.012

Abstract

We introduce a new algorithm for computing an approximately optimal binary search tree with known access probabilities or weights on items. The algorithm is simple to implement and it has two contributions. First, a randomized variant of the algorithm produces a binary search tree with expected performance that improves the previous theoretical guarantees (the performance is dependent on the value of the input random variable). More precisely, if p is the probability of accessing an item, then under expectation the item is found after searching lg 1 / p + 0.087 + lg 2 ( 1 + p max ) nodes, where p max is the maximal probability among items. The previous best bound was lg 1 / p + 1 , albeit deterministic. For the optimal tree our upper bound implies a non-constructive performance bound of H + 0.087 + lg 2 ( 1 + p max ) , where H is the entropy on the item distribution and the previous bound was H + 1 . The second contribution of the algorithm is a low cost in i/o models of cost such as the cache-oblivious model, while attaining simultaneously the above bound for the produced tree.

Full Text