Abstract
We address the problem of unsupervised clustering of multidimensional data when the number of clusters is not known a priori. The proposed iterative approach is a stochastic extension of the $k {\rm NN}$ density-based clustering (knnclust) method which randomly assigns objects to clusters by sampling a posterior class label distribution. In our approach, contextual class-conditional distributions are estimated based on a $k$ nearest neighbors graph, and are iteratively modified to account for current cluster labeling. Posterior probabilities are also slightly reinforced to accelerate convergence to a stationary labeling. A stopping criterion based on the measure of clustering entropy is defined thanks to the Kozachenko-Leonenko differential entropy estimator, computed from current class-conditional entropies. One major advantage of our approach relies in its ability to provide an estimate of the number of clusters present in the data set. The application of our approach to the clustering of real hyperspectral image data is considered. Our algorithm is compared with other unsupervised clustering approaches, namely affinity propagation (ap), knnclust and Non Parametric Stochastic Expectation Maximization (npsem), and is shown to improve the correct classification rate in most experiments.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have