Improving Multi-set Query Processing Via a Learned Oracle

Jingwen Cai,Wenbin He,Xian Zhang,Lingli Li,Yu Li

doi:10.1145/3393527.3393534

Abstract

Multi-set query is a fundamental problem in computer systems and applications. Most traditional solutions for multi-set query are based on hash tables or bloom filters. However, when the sizes of multi-sets are large, these solutions cannot achieve small memory usage, fast query speed and high accuracy at the same time. In this work, we study the problem of using a learned oracle to improve the performance of traditional multi-set query processing empirically. The key idea is to train an oracle to predict which set contains a query item e as a classification problem. To ensure an exact query result, we combine the learned oracle with a standard bloom filter and an exact-match index to catch items that are not correctly identified by the oracle. When the oracle is both small and efficient, the whole query performance can be improved. In our framework, we treat the learned oracle as a complete black box, and is not dependent on its inner workings. Theoretical proofs and experimental results show that compared to the state-of-the-art, the error rate of our approach is 0% even with much less memory usage and a comparable speed.

Full Text