Abstract

Clustering and diversification are two central problems with various applications in machine learning, data mining, and information retrieval. The k-center clustering and k-diversity maximization are two of the most well-studied and widely-used problems in this area. Both problems admit sequential algorithms with optimal approximation factors of 2 in any metric space. However, finding distributed algorithms matching the same optimal approximation ratios has been open for more than a decade, with the best current algorithms having factors at least twice the optimal. In this paper, we settle this open problem by presenting constant-round distributed algorithms for k-center clustering and k-diversity maximization in the massively parallel computation (MPC) model, achieving an approximation factor of 2 + ε in any metric space for any constant ε > 0, which is essentially the best possible considering the lower bound of 2 on the approximability of both these problems. Our algorithms are based on a novel technique for approximating vertex degrees and finding a so-called k-bounded maximal independent set in threshold graphs, using only a constant number of MPC rounds. Other applications of our general technique is also implied, including an almost optimal (3 + ε)-approximation algorithm for the k-supplier problem in any metric space in the MPC model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.