Robust k -Means-Type Clustering for Noisy Data.

Xi Xiao,Guojun Gan,Bin Zhang,Qing Li,Shutao Xia,Hailong Ma

doi:10.1109/tnnls.2024.3392211

Abstract

Data clustering is a fundamental machine learning task that seeks to categorize a dataset into homogeneous groups. However, real data usually contain noise, which poses significant challenges to clustering algorithms. In this article, motivated by how the k -means algorithm is derived from a Gaussian mixture model (GMM), we propose a robust k -means-type algorithm, named k -means-type clustering based on t -distribution (KMTD), by assuming that the data points are drawn from a special multivariate t -mixture model (TMM). Compared to the Gaussian distribution, the t -distribution has a fatter tail. The proposed algorithm is more robust to noise. Like the k -means algorithm, the proposed algorithm is simpler than those based on a full TMM. Both synthetic and actual data are used to illustrate the proposed algorithm's performance and efficiency. The experimental results demonstrated that the proposed algorithm operates more quickly than other sophisticated algorithms and, in most cases, achieves higher accuracy than the other algorithms.

Full Text