Data samples of complicated geometry and nonlinear separability are considered as common challenges to clustering algorithms. In this article, we first construct Mahalanobis distance in the kernel space and then propose a novel fuzzy clustering model with a kernelized Mahalanobis distance, namely KMD-FC. The key contributions of KMD-FC include: first, the construction of KMD matrix is innovatively transformed from the Euclidean distance kernel matrix, which is able to effectively avoid the problem of “curse of dimensionality” posed by explicitly calculating the sample covariance matrix in the kernel space; second, for the first time, the kernelized Gustafson–Kessel (GK) fuzzy C-means algorithm is achieved, which is critically important to extend the applications of the GK algorithm to the nonlinear classification tasks; finally, taking account of the overall distribution of samples in the kernel space after kernel mapping to improve the generalizability of the proposed KMD-FC clustering method. Comprehensive experiments conducted on a wide range of datasets, including synthetic datasets and machine learning repository (UCI) datasets, have validated that the proposed clustering algorithm outperformed the state-of-the-art methods in comparison.
Read full abstract