DHC: A Distributed Hierarchical Clustering Algorithm for Large Datasets

Wei Zhang,Xiumin Zhou,Xiaohui Chen,Gongxuan Zhang,Junlong Zhou,Yueqi Liu

doi:10.1142/s0218126619500658

Abstract

Hierarchical clustering is a classical method to provide a hierarchical representation for the purpose of data analysis. However, in practical applications, it is difficult to deal with massive datasets due to their high computation complexity. To overcome this challenge, this paper presents a novel distributed storage and computation hierarchical clustering algorithm, which has a lower time complexity than the standard hierarchical clustering algorithms. Our proposed approach is suitable for hierarchical clustering on massive datasets, which has the following advantages. First, the algorithm is able to store massive dataset exceeding the main memory space by using distributed storage nodes. Second, the algorithm is able to efficiently process nearest neighbor searching along parallel lines by using distributed computation at each node. Extensive experiments are carried out to validate the effectiveness of the DHC algorithm. Experimental results demonstrate that the algorithm is 10 times faster than the standard hierarchical clustering algorithm, which is an effective and flexible distributed algorithm of hierarchical clustering for massive datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DHC: A Distributed Hierarchical Clustering Algorithm for Large Datasets

Abstract

Talk to us

Similar Papers

More From: Journal of Circuits, Systems and Computers

Lead the way for us

Journal: Journal of Circuits, Systems and Computers	Publication Date: Mar 31, 2019
Citations: 11

Similar Papers

An Effective and Efficient Constrained Ward’s Hierarchical Agglomerative Clustering Method
Abeer A Aljohani ... Daphne Teck Ching Lai
-
Abeer A Aljohani, et. al.Abeer A Aljohani ... Daphne Teck Ching Lai
24 Aug 2019
24 Aug 2019

Hierarchical Density-Based Clustering Using MapReduce
Joelson Antonio Dos Santos ... Murilo C Naldi
IEEE Transactions on Big Data | VOL. 7
Joelson Antonio Dos Santos, et. al.Joelson Antonio Dos Santos ... Murilo C Naldi
01 Mar 2021
IEEE Transactions on Big Data | VOL. 7

An efficient hierarchical clustering model for grouping web transactions
Darenna Syahida Suib ... Mustafa Mat Deris
International Journal of Business Intelligence and Data Mining | VOL. 3
Darenna Syahida Suib, et. al.Darenna Syahida Suib ... Mustafa Mat Deris
01 Jan 2008
International Journal of Business Intelligence and Data Mining | VOL. 3

A Kind of Hierarchical K-Means Web Log Clustering Algorithm
Li Xia Liu ... Yi Qi Zhuang
Key Engineering Materials | VOL. 439-440
Li Xia Liu, et. al.Li Xia Liu ... Yi Qi Zhuang
01 Jun 2010
Key Engineering Materials | VOL. 439-440

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DHC: A Distributed Hierarchical Clustering Algorithm for Large Datasets

Abstract

Talk to us

Similar Papers

More From: Journal of Circuits, Systems and Computers