Abstract

Clustering validation and identifying the optimal number of clusters are of great importance in expert and intelligent systems. However, the commonly used similarity measures for validating are not versatile to measure the complex data structure, in reality, some of which are not as effective as that of the used clustering algorithm which gives the clustering results. This paper studies the validity indexes for the hierarchical clustering algorithm and proposes a unified validity index framework. For the single-linkage agglomerative hierarchical clustering we propose two efficient synthetical clustering validity (SCV) indexes using the minimum spanning tree to calculate the intra-cluster compactness to overcome the deficiencies of the measurements in the existing validity indexes. For the general hierarchical clustering, a self-adaptive similarity measure strategy and two generalized synthetical clustering validity (GSCV) indexes, which are the extension of the proposed SCV indexes, are developed. The proposed SCV and GSCV indexes constitute a unified validity index framework, where SCV index is a special case of GSCV index, can avoid the incompatibility of the similarity measure between the clustering and validation. The experimental comparisons with the state-of-the-art validity indexes on artificial and real-world data sets demonstrate the efficiency of the proposed validity indexes in discovering the true number of clusters and dealing with various sorts of data sets, including imbalanced data sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call