Efficient synthetical clustering validity indexes for hierarchical clustering

Qin Xu,Qiang Zhang,Jinpei Liu,Bin Luo

doi:10.1016/j.eswa.2020.113367

Qin Xu, Qiang Zhang + Show 2 more

https://doi.org/10.1016/j.eswa.2020.113367

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Clustering validation and identifying the optimal number of clusters are of great importance in expert and intelligent systems. However, the commonly used similarity measures for validating are not versatile to measure the complex data structure, in reality, some of which are not as effective as that of the used clustering algorithm which gives the clustering results. This paper studies the validity indexes for the hierarchical clustering algorithm and proposes a unified validity index framework. For the single-linkage agglomerative hierarchical clustering we propose two efficient synthetical clustering validity (SCV) indexes using the minimum spanning tree to calculate the intra-cluster compactness to overcome the deficiencies of the measurements in the existing validity indexes. For the general hierarchical clustering, a self-adaptive similarity measure strategy and two generalized synthetical clustering validity (GSCV) indexes, which are the extension of the proposed SCV indexes, are developed. The proposed SCV and GSCV indexes constitute a unified validity index framework, where SCV index is a special case of GSCV index, can avoid the incompatibility of the similarity measure between the clustering and validation. The experimental comparisons with the state-of-the-art validity indexes on artificial and real-world data sets demonstrate the efficiency of the proposed validity indexes in discovering the true number of clusters and dealing with various sorts of data sets, including imbalanced data sets.

Full Text