PACk

Yue Wang,Vivek Narasayya,Yeye He,Surajit Chaudhuri

doi:10.14778/3514061.3514062

PACk

Yue Wang, Vivek Narasayya + Show 2 more

https://doi.org/10.14778/3514061.3514062

Copy DOI

Journal: Proceedings of the VLDB Endowment	Publication Date: Feb 1, 2022
Citations: 2

Affiliation: Microsoft Research (United Kingdom)

#Agglomerative Hierarchical Clustering Algorithm #Agglomerative Hierarchical Clustering + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

The Agglomerative Hierarchical Clustering (AHC) algorithm is widely used in real-world applications. As data volumes continue to grow, efficient scale-out techniques for AHC are becoming increasingly important. In this paper, we propose a Partition-based distributed Agglomerative Hierarchical Clustering (PACk) algorithm using novel distance-based partitioning and distance-aware merging techniques. We have developed an efficient implementation of PACk on Spark. Compared to the state-of-the-art distributed AHC algorithm, PACk achieves 2X to 19X (median=9X) speedup across a variety of synthetic and real-world datasets.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Proceedings of the VLDB Endowment

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.