K-tree/forest

Rakesh M Verma,Sanjiv Behl

doi:10.1145/564376.564480

Abstract

In Information Retrieval it is well-known that the complexity of processing boolean queries depends on the size of the intermediate results, which could be huge (and are typically on disk) even though the size of the final result may be quite small. In the case of inverted files the most time consuming operation is the merging or intersection of the list of occurrences [1]. We propose, the Keyword tree (K-tree) and forest, efficient structures to handle boolean queries in keyword-based information retrieval. Extensive simulations show that K-tree is orders-of-magnitude faster (i.e., far fewer I/O's) for boolean queries than the usual approach of merging the lists of occurrences and incurs only a small overhead for single keyword queries. The K-tree can be efficiently parallelized as well. The construction cost of K-tree is comparable to the cost of building inverted files.

Full Text