Efficient Group Communication for Large-Scale Parallel Clustering

David Pettinger,Giuseppe Di Fatta

doi:10.1007/978-3-642-32524-3_20

Abstract

Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in iterative parallel data mining algorithms. In particular, the analysis focuses on one of the most influential and popular data mining methods, the k-means algorithm for cluster analysis. The straightforward parallel formulation of the k-means algorithm requires a global reduction operation at each iteration step, which hinders its scalability. This work studies a different parallel formulation of the algorithm where the requirement of global communication can be relaxed while still providing the exact solution of the centralised k-means algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient Group Communication for Large-Scale Parallel Clustering

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Non-uniform data distribution for communication-efficient parallel clustering
Tabitha Goodall ... Giuseppe Di Fatta
Journal of Computational Science | VOL. 4
Tabitha Goodall, et. al.Tabitha Goodall ... Giuseppe Di Fatta
04 Feb 2013
Journal of Computational Science | VOL. 4

Dynamic group communication for large-scale parallel data mining
Amogh Katti ... Giuseppe Di Fatta
Concurrent Engineering | VOL. 21
Amogh Katti, et. al.Amogh Katti ... Giuseppe Di Fatta
01 Aug 2013
Concurrent Engineering | VOL. 21

Development of Tools for Creating Parallel Data Mining Algorithms
Karshiyev Zaynidin ... Sattarov Mirzabek
Journal of Advanced Research in Applied Sciences and Engineering Technology | VOL. 39
Karshiyev Zaynidin, et. al. Karshiyev Zaynidin ... Sattarov Mirzabek
07 Feb 2024
Journal of Advanced Research in Applied Sciences and Engineering Technology | VOL. 39

Research on a Scalable Parallel Data Mining Algorithm
Jinlin Wang ... Kefa Zhou
-
Jinlin Wang, et. al.Jinlin Wang ... Kefa Zhou
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Group Communication for Large-Scale Parallel Clustering

Abstract

Talk to us

Similar Papers