Adaptive Explicit Kernel Minkowski Weighted K-means

Amir Aradnia,Maryam Amir Haeri,Mohammad Mehdi Ebadzadeh

doi:10.1016/j.ins.2021.10.048

Amir Aradnia, Maryam Amir Haeri + Show 1 more

Open Access

PDF Available

https://doi.org/10.1016/j.ins.2021.10.048

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

The K-means algorithm is among the most commonly used data clustering methods. However, the regular K-means can only be applied in the input space, and it is applicable when clusters are linearly separable. The kernel K-means, which extends K-means into the kernel space, is able to capture nonlinear structures and identify arbitrarily shaped clusters. However, kernel methods often operate on the kernel matrix of the data, which scale poorly with the size of the matrix, or suffer from the high clustering cost due to the repetitive calculations of kernel values. Another issue is that algorithms access the data only through evaluation of K(xi,xj), which limits many processes that can be done on data through the clustering task. This paper proposes a method to combine the advantages of the linear and nonlinear approaches by using derived corresponding approximate finite-dimensional feature maps based on spectral analysis. Applying approximate finite-dimensional feature maps have been discussed before only in the context of Support Vector Machines (SVM) problems. We suggest using this method in the kernel K-means context as it does not require storing a huge kernel matrix in memory, calculates cluster centers more efficiently, and accesses the data explicitly in the feature space; thus taking advantage of K-means extensions in that space. We demonstrate that our Explicit Kernel Minkowski Weighted K-means (Explicit KMWK-means) method is able to achieve high accuracy in terms of cluster recovery in the new space by applying additional Minkowski exponent and feature weights. The proposed method is evaluated by four benchmark data sets, and its performance is compared with the commonly used kernel clustering approaches. Experiments show the proposed method consistently achieves superior clustering performances while reducing the memory consumption.

Highlights

Clustering can be considered as the most important unsupervised learning problem
Two standard metrics were used to measure the performance of the image clustering that is, Normalized Mutual Information (NMI) and Purity
We proposed a kernel K-means method which is based on explicit feature maps with further matching in feature space

Summary

Introduction

Clustering can be considered as the most important unsupervised learning problem. Clustering methods are used to determine the intrinsic grouping in a set of unlabeled data. The K-means algorithm only works reasonably well when 1) clusters can be separated by hyper-planes and 2) each data point belongs to the closest cluster center. If one of these principles does not hold, the standard K-means algorithm will likely not give a good result. Kernel-based clustering methods overcome these limitations by using an appropriate non-linear mapping to higher dimensional feature space. It enables the K-means algorithm to partition data points by the linear separator in the new space, that has non-linear projection back in the original space. Various studies [24, 7, 6] claim that different kernel-based clustering methods show similar result as kernel K-means

Objectives

Results

Conclusion