Abstract

The fuzzy c-means (FCM) clustering algorithm requires the pre-definition of the number of clusters. When used for imbalanced datasets, FCM tends to equalize the sizes of clusters and thus produces bad clustering results. Although clustering validity index (CVI) is commonly employed to determine the number of clusters, it is often ineffective for bad clustering results of FCM. To address this issue, we have previously proposed a fuzzy CVI, namely IMI, for imbalanced datasets. However, IMI encounters issues when the number of clusters exceeds three, as it tends to divide the majority clusters into multiple subclusters. In this paper, a novel CVI, IMI2, is proposed to enhance the performance of IMI. IMI2 adopts a two-step clustering algorithm that merges some clusters from the IMI result based on their separation values. This enhancement overcomes the limitations of IMI, rendering IMI2 an excellent CVI capable of accurately determining the number of clusters for datasets with multiple imbalanced clusters. The performance of IMI2 is evaluated using several synthetic and UCI datasets. The results demonstrate that IMI2 outperforms other methods in terms of its superior performance for multiple imbalanced clusters. IMI2 retains the good performance of IMI for datasets with two or three clusters, while significantly improving it for multiple imbalanced clusters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call