Abstract

With the growing interest in automatic understanding, processing and summarization of data, several application domains, such as pattern recognition, machine learning and computational biology, have been making use of clustering algorithms. In the fuzzy clustering approach, Fuzzy C-Means (FCM) is the best-known method. Although FCM has performed well in cluster detection, membership values for each element assigned to each of the clusters cannot indicate how well the individuals are clustered in relation to each variable. To deal with this problem, a multivariate version of fuzzy c-means algorithm has been proposed. This proposition does not consider that there is a different relevant weight associated with each variable and that it may also be different from one cluster to another. Here, we propose two multivariate fuzzy c-means algorithms with weighting. Weights aim to represent how important each different variable is for each cluster and to improve the clustering quality. Furthermore, we propose tools based on suitable dispersion measures for interpretation of the fuzzy partition and fuzzy clusters obtained by multivariate fuzzy c-means methods. These tools allow the measurement of the overall quality, homogeneity of clusters and the role of different variables in the cluster formation process. To evaluate the performance of the proposed algorithms against other methods established by the clustering literature, experiments are performed with synthetic and UCI repository data sets, showing the usefulness of the algorithms with weighting.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call