Designing a space-efficient data structure to answer membership queries while ensuring high accuracy and real-time response is a challenging task in the field of stream processing. Many techniques have been developed to answer these queries in a sliding windows manner. However, assuming the user will conduct the query with the presupposed window size is not always practical. In this paper, we introduce a novel data structure called Learned Cuckoo Filter (LCF). It can provide satisfactory results for the approximate membership query on data streams, regardless of the user-defined query windows. LCF operates by adaptively maintaining cuckoo filters with the assistance of a well-trained oracle that learned the frequency feature of the data within the stream. To further enhance memory utilization, we develop a compact version of LCF (denoted by LCF_C), which selectively removes redundant information to reduce space consumption without compromising query accuracy. Furthermore, we conduct a thorough theoretical analysis of query accuracy and provide detailed guidelines for optimal parameter selection (denoted by LCF_O). Extensive experimental studies on synthetic and real-world datasets demonstrate the superiority of the proposed methods in terms of both space consumption and accuracy. Compared to the state-of-the-art algorithms, LCF_O can reduce up to 61% of space cost at the same error level, and achieve up to 12× improved accuracy with the same space cost.
Read full abstract