Abstract
The K-means algorithm is among the most commonly used data clustering methods. However, the regular K-means can only be applied in the input space, and it is applicable when clusters are linearly separable. The kernel K-means, which extends K-means into the kernel space, is able to capture nonlinear structures and identify arbitrarily shaped clusters. However, kernel methods often operate on the kernel matrix of the data, which scale poorly with the size of the matrix, or suffer from the high clustering cost due to the repetitive calculations of kernel values. Another issue is that algorithms access the data only through evaluation of K(xi,xj), which limits many processes that can be done on data through the clustering task. This paper proposes a method to combine the advantages of the linear and nonlinear approaches by using derived corresponding approximate finite-dimensional feature maps based on spectral analysis. Applying approximate finite-dimensional feature maps have been discussed before only in the context of Support Vector Machines (SVM) problems. We suggest using this method in the kernel K-means context as it does not require storing a huge kernel matrix in memory, calculates cluster centers more efficiently, and accesses the data explicitly in the feature space; thus taking advantage of K-means extensions in that space. We demonstrate that our Explicit Kernel Minkowski Weighted K-means (Explicit KMWK-means) method is able to achieve high accuracy in terms of cluster recovery in the new space by applying additional Minkowski exponent and feature weights. The proposed method is evaluated by four benchmark data sets, and its performance is compared with the commonly used kernel clustering approaches. Experiments show the proposed method consistently achieves superior clustering performances while reducing the memory consumption.
Highlights
Clustering can be considered as the most important unsupervised learning problem
Two standard metrics were used to measure the performance of the image clustering that is, Normalized Mutual Information (NMI) and Purity
We proposed a kernel K-means method which is based on explicit feature maps with further matching in feature space
Summary
Clustering can be considered as the most important unsupervised learning problem. Clustering methods are used to determine the intrinsic grouping in a set of unlabeled data. The K-means algorithm only works reasonably well when 1) clusters can be separated by hyper-planes and 2) each data point belongs to the closest cluster center. If one of these principles does not hold, the standard K-means algorithm will likely not give a good result. Kernel-based clustering methods overcome these limitations by using an appropriate non-linear mapping to higher dimensional feature space. It enables the K-means algorithm to partition data points by the linear separator in the new space, that has non-linear projection back in the original space. Various studies [24, 7, 6] claim that different kernel-based clustering methods show similar result as kernel K-means
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have