Abstract
We propose a novel Mean-Shift method for data clustering, called Robust Mean-Shift (RMS). A new update equation for point iterates is proposed, mixing the ones of the standard Mean-Shift (MS) and the Blurring Mean-Shift (BMS). Despite its simplicity, the proposed method has not been studied so far. RMS can be set up in both a kernel-based and a nearest-neighbor (NN)-based fashion. Since the update rule of RMS is closer to BMS, the convergence of point iterates is conjectured based on the Chen’s BMS convergence theorem. Experimental results on synthetic and real datasets show that RMS in several cases outperforms MS and BMS in the clustering task. In addition, RMS exhibits larger attraction basins than MS and BMS for identical parametrization; consequently, its kernel variant requires a lower aperture of the kernel function, and its NN variant a lower number of nearest neighbors compared to MS or BMS, to achieve optimal clustering results. In addition, the NN version of RMS does not need to specify a convergence threshold to stop the iterations, contrarily to the NN-BMS algorithm.
Highlights
D ATA clustering is a type of unsupervised learning which consists of automatically grouping data points having similar characteristics into identified clusters without training sample points
We find experimentally that Robust Mean-Shift (RMS) requires a lower bandwidth parameter and a lower number of nearest neighbors than MS and Blurring Mean-Shift (BMS) to achieve comparable or even better results in the clustering task; this is especially interesting to speed up the computation of point iterates, and in the case of the NN-RM variant, to reduce the size of the NN graph compared to the ones required by the NN variants of MS and BMS;
This approach differs from the standard Mean-Shift (MS) and Blurring Mean-Shift (BMS) ones by its update equation
Summary
D ATA clustering is a type of unsupervised learning which consists of automatically grouping data points having similar characteristics into identified clusters without training sample points. Despite several decades of research, clustering remains a challenging task for many applications because of the increasing size (number of data points) and dimensionality (number of features) of modern datasets. Most popular methods claimed as unsupervised require a significant prior knowledge about the data structure, i.e. the number of clusters to be found. We propose a novel approach to the classical Mean-Shift algorithm focusing data clustering; the KDE problem is not investigated . Our algorithm is based on iterative updates of moving data points, similar to MS and BMS, but the update equation of RMS fundamentally differs from both methods.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have