Abstract

Clustering, an unsupervised pattern classification method, plays an important role in identifying input dataset structures. It partitions input datasets into clusters or groups where either the optimum number of clusters is known in prior or automatically determined. In the case of automatic clustering, the performance is evaluated using a cluster validity index (CVI), which determines the optimum number of clusters in the data. From previous works, the improper cluster centroids positioning produced by clustering algorithms could reduce the performance of the validation process and performance produced by the previous state-of-the-art CVIs. In addition, those previous CVIs can only work properly with certain clustering algorithms and simple datasets structures, which their performances will reduce if they are applied to other clustering algorithms as well as more complex datasets. This study proposes an efficient CVI, namely, the validity clustering index based on finding the mean of clustered data (VCIM). The proposed approach combines the properties of the score function index and the mean to determine new cluster centroid positions. The performance of the VCIM index is compared with well-known CVIs on both artificial and real-life datasets. The obtained results on artificial datasets show that the proposed VCIM index outperforms the other CVIs in determining the true number of clusters for the five conventional clustering algorithms, namely, K-means, Fuzzy C-mean, agglomerative hierarchical average linkage clustering, variance-based differential evolution, and density peaks clustering and Particle swarm optimization (PDPC) algorithms. For the 14 real-word datasets, the proposed VCIM index correctly determined the optimum number of clusters for 11 out of 14 for the K-means clustering algorithm, 9 out of 14 for both Fuzzy clustering and agglomerative hierarchical average linkage clustering algorithms, 12 out of 14 for the variance-based differential evolution algorithm and 11 out of 14 datasets for PDPC. The obtained results using the proposed VCIM show its significance when combined with clustering algorithms and nominate its potential in various clustering applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call