Abstract

Multi-set membership queries are fundamental operations in data science. In this paper, we propose a new data structure for multi-set membership queries, named coloring embedder, which is fast, accurate, and memory efficient. The idea of coloring embedder is to first map elements to a high-dimensional space, which nearly eliminates hashing collisions, and then use a dimensional reduction representation, similar to coloring a graph, to save memory. Theoretical proofs and experimental results show that the coloring embedder is effective in solving the problem of multi-set membership queries. We also find that web cache sharing is one of the typical application scenarios of the multi-set membership queries and current methods based on Bloom filters always send redundant queries. We apply coloring embedder to web cache sharing by arranging our data structure on the on-chip and off-chip memory and designing query, insertion and deletion operations for this scenario. The experimental results show that compared with the present method, our method can reduce the queries sent by proxies while reaching equal hit rate with the same size of on-chip memory. The source code of coloring embedder has been released on Github.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call