Pavo: A RNN-Based Learned Inverted Index, Supervised or Unsupervised?

Wenkun Xiang,Hao Zhang,Keqin Li,Rui Cui,Xing Chu,Wei Zhou

doi:10.1109/access.2018.2885350

Abstract

The booms of big data and graphic processing unit technologies have allowed us to explore more appropriate data structures and algorithms with smaller time complexity. However, the application of machine learning as a potential alternative for the traditional data structures, especially using deep learning, is a relatively new and largely uncharted territory. In this paper, we propose a novel recurrent neural network-based learned inverted index, called Pavo, to efficiently and flexibly organize inverted data. The basic hash function in the traditional inverted index is replaced by a hierarchical neural network, which makes Pavo be able to well adapt for various data distributions while showing lower collision rate as well as higher space utilization rate. A particular feature of our approach is that a novel unsupervised learning strategy to construct the hash function is proposed. To the best of our knowledge, there are no similar results, in which the unsupervised learning strategy is employed to design hash functions, in the existing literature. Extensive experimental results show that the unsupervised model owns some advantages than the supervised one. Our approaches not only demonstrate the feasibility of deep learning-based data structures for index purpose but also provide benefits for developers to make more accurate decisions on both the design and the configuration of data organization, operation, and parameters tuning of neural network so as to improve the performance of information searching.

Full Text