An Effective and Efficient Constrained Ward’s Hierarchical Agglomerative Clustering Method

Abeer A Aljohani,Eran A Edirisinghe,Daphne Teck Ching Lai

doi:10.1007/978-3-030-29516-5_46

Abeer A Aljohani, Eran A Edirisinghe + Show 1 more

https://doi.org/10.1007/978-3-030-29516-5_46

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Constraints-based hierarchical clustering (HC) has emerged as an important improvement over the existing clustering algorithms. Triple-wise relative constraints are suitable to be applicable for HC, enabling the derivation of a cluster hierarchy instead of a flat partition. This paper proposes Constrained Ward’s Hierarchical Agglomerative Clustering algorithm (CWHAC). It is a novel variation of Ward’s hierarchical agglomerative clustering method based on the ideas of triple-wise relative constraints. The algorithm is proposed based on the ultra-metric transformation of the dissimilarity matrix which exploits the triple-wise relative constraints as background knowledge to create a new metric for data similarity. IPoptim and UltraTran methods are introduced to address the triple-wise relative constraints to modify and update the similarity metric for the proposed algorithm. This study addresses the issue of non-satisfaction of triple-wise relative constraints with HC to improve the effectiveness of CWHAC by addressing the issue of constraint violation and redundancy. Furthermore, this paper presents three computational optimization strategies for generating constraints to enhance the efficiency of CWHAC for massive data sets. The proposed algorithm is validated using seven benchmark UCI datasets in terms of F-Score for effectiveness and execution time for efficiency by varying the proportion of constraints. Experimental results demonstrate the improvements made by the proposed algorithm in comparison to the existing Ward’s Hierarchical Clustering algorithm, based constraints and unsupervised HC. Finally, Mann-Witney test is performed to prove the significant improvement demonstrated by the proposed algorithm.

Full Text