Abstract

Determining the number of clusters of a data set, which is usually evaluated by a clustering validity index (CVI), is a significant issue in clustering analysis. While several CVIs have been proposed, the imperfect clustering results of the fuzzy c-means (FCM) clustering algorithm on imbalanced data sets may affect their decisions. To address this problem, the impact of imperfect clustering results on the traditional CVI is first analyzed, and it is found that the distance between two imbalanced clusters becomes closer, which will subsequently impact the separation metric. Inspired by this, a new fuzzy CVI called the imbalanced index (IMI) is proposed in this paper. IMI is the ratio of the fuzzy compactness and separation metrics. The main characteristic of IMI is the new definition of the separation metric, in which the imbalance ratio of two clusters is used to enlarge the distance between their centers. IMI is then employed to evaluate the clustering results of FCM on a variety of data sets, and is compared with several well-known CVIs. The experimental results demonstrate that IMI is robust to the imperfect clustering results of FCM caused by imbalanced data distributions and achieves superior performance as compared to other CVIs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call