Abstract

In Information Retrieval it is well-known that the complexity of processing boolean queries depends on the size of the intermediate results, which could be huge (and are typically on disk) even though the size of the final result may be quite small. In the case of inverted files the most time consuming operation is the merging or intersection of the list of occurrences [1]. We propose, the Keyword tree (K-tree) and forest, efficient structures to handle boolean queries in keyword-based information retrieval. Extensive simulations show that K-tree is orders-of-magnitude faster (i.e., far fewer I/O's) for boolean queries than the usual approach of merging the lists of occurrences and incurs only a small overhead for single keyword queries. The K-tree can be efficiently parallelized as well. The construction cost of K-tree is comparable to the cost of building inverted files.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.