Abstract

Storing and querying high-dimensional data are important problems in designing an information retrieval system. Two crucial issues, time and space efficiencies, must be considered when evaluating the performance of such a system. The KDB-tree and its variants have been reported to have good performance by using them as the index structure for retrieving multidimensional data. However, they all suffer from low storage utilization problem caused by imperfect “splitting policies.” Unnecessary splits increase the size of the index structure and deteriorate the performance of the system. In this paper, a new data insertion algorithm with a better splitting policy was proposed, which arranges data entries in the leaf nodes as many as possible. Our new index scheme can increase the storage utilization up to nearly 100% and reduce the index size to a smaller scale. As a result, both time and space efficiencies are significantly improved. Analytical and experimental results show that our indexing method outperforms the traditional KDB-tree and its variants.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call