Abstract

Two well-known drawbacks in fuzzy clustering are the requirement of assigning in advance the number of clusters and random initialization of cluster centers. The quality of the final fuzzy clusters depends heavily on the initial choice of the number of clusters and the initialization of the clusters, then, it is necessary to apply a validity index to measure the compactness and the separability of the final clusters and run the clustering algorithm several times. We propose a new fuzzy C-means algorithm in which a validity index based on the concepts of maximum fuzzy energy and minimum fuzzy entropy is applied to initialize the cluster centers and to find the optimal number of clusters and initial cluster centers in order to obtain a good clustering quality, without increasing time consumption. We test our algorithm on UCI (University of California at Irvine) machine learning classification datasets comparing the results with the ones obtained by using well-known validity indices and variations of fuzzy C-means by using optimization algorithms in the initialization phase. The comparison results show that our algorithm represents an optimal trade-off between the quality of clustering and the time consumption.

Highlights

  • A validity index is a measure applied in fuzzy clustering to evaluate the compactness of clusters and the separability among clusters.Numerous validity indices have been applied to measure the compactness and separateness of clusters detected by applying the fuzzy C-means (FCM) algorithm [1,2].The two well-known main drawbacks of the FCM are the random setting of the initial clusters and the requirement of assigning the number of clusters in advance

  • The quality of the final fuzzy clusters depends on the choice of the number of clusters, it is necessary to use a validity index to evaluate what is the optimal number of clusters

  • We propose a FCM variation in which a new validity index based on the De Luca and Termini Fuzzy Entropy and Fuzzy Energy concepts [17,18] is used to optimize the initialization of the clusters and to find the optimal number of clusters

Read more

Summary

Introduction

A validity index is a measure applied in fuzzy clustering to evaluate the compactness of clusters and the separability among clusters. A simple technique applied to solve these problems is to execute the clustering algorithm several times, varying the initial centers of the clusters and the number of clusters, and to choose the optimal clustering using a validity index to measure the quality of the final clustering. Three hybrid FCM algorithms, based on Differential Evolution, GA, and PSO methods, are proposed in Reference [16] to optimize the cluster centers’ initialization These algorithms, while guaranteeing a higher quality of results, require too long execution times, and they too are unsuitable for handling high-dimensional data. The algorithm proposed in Reference [19] is less time-consuming than hybrid algorithms using meta-heuristic approaches, but like the algorithm proposed in Reference [10], it applies an iterative method of pre-processing to initialize cluster centers It does not detect the optimal number of clusters that must be set in advance.

Fuzzy Energy and Entropy Measures
Fuzzy C-Means Algorithm
The Proposed FCM Algorithm Based on a Fuzzy Energy and Entropy Validity Index
Results
50 Iris data points to the Only
Number of iterationsofofthe thePEHFCM
Method
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call