Abstract

We present a low-constant approximation for the metric k-median problem on insertion-only streams using O(ε−3klogn) space. In particular, we present a streaming (O(ε−3klogn),2+ε)-bicriterion solution that reports cluster weights. Running the offline approximation algorithm due to Byrka et al. (2015) on this bicriterion solution yields a (17.66+ε)-approximation (Guha et al., 2003; Charikar et al., 2003; Braverman et al., 2011). Our result matches the best-known space requirements for streaming k-median clustering while significantly improving the approximation accuracy. We also provide a lower bound, showing that any polylog(n)-space streaming algorithm that maintains an (α,β)-bicriterion must have β≥2. Our technique breaks the stream into segments defined by jumps in the optimal clustering cost, which increases monotonically as the stream progresses. By storing an accurate summary of recent segments of the stream and a lower-space summary of older segments, our algorithm maintains a (O(ε−3klogn),2+ε)-bicriterion solution for the entirety of the stream.In addition to our main result, we introduce a novel construction that we call a candidate set. This is a collection of points that, with high probability, contains k points that yield a near-optimal k-median cost. We present an algorithm called monotone faraway sampling (MFS) for constructing a candidate set in a single pass over a data stream. We show that using this candidate set in tandem with a coreset speeds up the search for a solution set of k cluster centers upon termination of the data stream. While coresets of smaller asymptotic size are known, comparative simplicity of MFS makes it appealing as a practical technique.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.