Fuzzy C-Means clustering algorithm for data with unequal cluster sizes and contaminated with noise and outliers: Review and development

Salar Askari

doi:10.1016/j.eswa.2020.113856

Salar Askari

https://doi.org/10.1016/j.eswa.2020.113856

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Clustering algorithms aim at finding dense regions of data based on similarities and dissimilarities of data points. Noise and outliers contribute to the computational procedure of the algorithms as well as the actual data points that leads to inaccurate and misplaced cluster centers. This problem also arises when sizes of the clusters are different that moves centers of small clusters towards large clusters. Mass of the data points is important as well as their location in engineering and physics where non-uniform mass distribution results displacement of the cluster centers towards heavier clusters even if sizes of the clusters are identical and the data are noise-free. Fuzzy C-Means (FCM) algorithm that suffers from these problems is the most popular fuzzy clustering algorithm and has been subject of numerous researches and developments though improvements are still marginal. This work revises the FCM algorithm to make it applicable to data with unequal cluster sizes, noise and outliers, and non-uniform mass distribution. Revised FCM (RFCM) algorithm employs adaptive exponential functions to eliminate impacts of noise and outliers on the cluster centers and modifies constraint of the FCM algorithm to prevent large or heavier clusters from attracting centers of small clusters. Several algorithms are reviewed and their mathematical structures are discussed in the paper including Possibilistic Fuzzy C-Means (PFCM), Possibilistic C-Means (PCM), Robust Fuzzy C-Means (FCM-σ), Noise Clustering (NC), Kernel Fuzzy C-Means (KFCM), Intuitionistic Fuzzy C-Means (IFCM), Robust Kernel Fuzzy C-Mean (KFCM-σ), Robust Intuitionistic Fuzzy C-Means (IFCM-σ), Kernel Intuitionistic Fuzzy C-Means (KIFCM), Robust Kernel Intuitionistic Fuzzy C-Means (KIFCM-σ), Credibilistic Fuzzy C-Means (CFCM), Size-insensitive integrity-based Fuzzy C-Means (siibFCM), Size-insensitive Fuzzy C-Means (csiFCM), Subtractive Clustering (SC), Density Based Spatial Clustering of Applications with Noise (DBSCAN), Gaussian Mixture Models (GMM), Spectral clustering, and Outlier Removal Clustering (ORC). Some of these algorithms are suitable for noisy data and some others are designed for data with unequal clusters. The study shows that the RFCM algorithm works for both cases and outperforms the both categories of the algorithms.

Full Text