Abstract
Given a weighted set S of n elements, weighted set sampling (WSS) samples an element in S so that each element a i ; is sampled with a probability proportional to its weight w ( a i ). The classic alias method pre-processes an index in O ( n ) time with O ( n ) space and handles WSS with O (1) time. Yet, the alias method does not support dynamic updates. By minor modifications of existing dynamic WSS schemes, it is possible to achieve an expected O (1) update time and draw t independent samples in expected O ( t ) time with linear space, which is theoretically optimal. But such a method is impractical and even slower than a binary search tree-based solution. How to support both efficient sampling and updates in practice is still challenging. Motivated by this, we design BUS , an efficient scheme that handles an update in O (1) amortized time and draws t independent samples in O (log n + t) time with linear space. A natural extension of WSS is the weighted independent range sampling (WIRS) , where each element in S is a data point from R. Given an arbitrary range Q = [ℓ, r ] at query time, WIRS aims to do weighted set sampling on the set S Q of data points falling into range Q. We show that by integrating the theoretically optimal dynamic WSS scheme mentioned above, it can handle an update in O (log n ) time and can draw t independent samples for WIRS in O (log n + t ) time, the same as the state-of-the-art static algorithm. Again, such a solution by integrating the optimal dynamic WSS scheme is still impractical to handle WIRS queries. We further propose WIRS-BUS to integrate BUS to handle WIRS queries, which handles each update in O (log n ) time and draws t independent samples in O (log 2 n + t ) time with linear space. Extensive experiments show that our BUS and WIRS-BUS are efficient for both sampling and updates.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.