Abstract

Sampling provides fundamental support to numerous applications that cannot afford to materialize all the objects arriving at a rapid speed. Existing stream sampling algorithms guarantee small space and query overhead, but all require worst-case update time proportional to the number of samples. This creates a performance issue when a large sample set is required. In this paper, we propose a new sampling algorithm that is optimal simultaneously in all the three aspects: space, query time, and update time. In particular, the algorithm handles an update in O(1) worst-case time with a very small hidden constant. Our algorithm also ensures a strong independence guarantee: the sample sets of all the queries are mutually independent as long as the overlap between two query windows is small.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.