Parallel hierarchical clustering using weighted confidence affinity

Baoying Wang,Imad Rahal,Aijuan Dong

doi:10.1504/ijdmmm.2011.041491

Abstract

There have been many attempts for clustering categorical data such as market basket dataset. However, most of categorical clustering approaches belong to partitional clustering which requires at least one input parameter (e.g., the minimum intra-cluster similarity or the desired number of clusters). In this paper, we propose a parallelised hierarchical clustering approach for categorical data (PH-clustering) using vertical data structures. In order to minimise the impact of low support items, we devise a weighted confidence (WC) affinity function to compute the similarity between clusters. Based on our analysis of the major clustering steps, we adopt a partial local and partial global approach to reduce computation time as well as to keep network communication at minimum. Load balance issues are addressed especially during the data partitioning phase. Our experimental results on standardised market basket data show that the proposed weighted confidence affinity measure is more accurate than other contemporary affinity measures in the literature and that our parallel clustering approach provides magnitudes of time improvements over sequential clustering especially over larger data sizes. Our results also indicate that the number of items/attributes in the dataset has a more drastic impact on performance than the number of transactions/tuples.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Parallel hierarchical clustering using weighted confidence affinity

Abstract

Talk to us

Similar Papers

More From: International Journal of Data Mining, Modelling and Management

Lead the way for us

Journal: International Journal of Data Mining, Modelling and Management	Publication Date: Jan 1, 2011
Citations: 7

Similar Papers

High-Speed Identification of Language and Script
Alan Ratner ... Ron Loui
-
Alan Ratner, et. al.Alan Ratner ... Ron Loui
01 Oct 2007
01 Oct 2007

WC-Clustering: Hierarchical Clustering Using the Weighted Confidence Affinity Measure
Baoying Wang ... Imad Rahal
-
Baoying Wang, et. al.Baoying Wang ... Imad Rahal
01 Oct 2007
01 Oct 2007

A multi-act sequential game-based multi-objective clustering approach for categorical data
Imen Heloulou ... Mohand Tahar Kechadi
Neurocomputing | VOL. 267
Imen Heloulou, et. al.Imen Heloulou ... Mohand Tahar Kechadi
09 Jun 2017
Neurocomputing | VOL. 267

THE TABU SEARCH APPLICATION: AN APPROACH TO MINE MARKET BASKET DATA
Haibo Wang ... Manying Qiu
Review of Business Research | VOL. 13
Haibo Wang, et. al.Haibo Wang ... Manying Qiu
01 Mar 2013
Review of Business Research | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Parallel hierarchical clustering using weighted confidence affinity

Abstract

Talk to us

Similar Papers

More From: International Journal of Data Mining, Modelling and Management