Abstract
Spectral Clustering (SC) is a kernel method to cluster data objects using eigenvectors derived from the data. One fundamental issue that cause poor cuts in SC is its sensitivity to outliers. Another fundamental problem is how to determine the kernel bandwidth from the data. In fact, these two problems are also closely related. One cannot be solved before solving the other. The answer lies in robust and nonparametric estimators of the data density. We propose Reverse Soft K-Nearest Neighbor Density Estimation (RSKNN) that determines the density around a data sample, thus this sample's potential (other used terms are weight or entropy), using all the other samples' nearest neighbors' scatter properties on the contrary to the common practice of using the nearest neighbors of a sample itself to determine its own density. The basic idea behind this can be summarized as “every sample have k neighbors that are the nearest but not every point can be in many points' k-nearest neighborhood”. To demonstrate the use of it, we apply it to SC. Our package to use in SC consists of using RSKNN for estimating the density and the samples with high potential to be cluster centers helps in: 1) spectral decomposition phase to improve the generalization of the spectral cut criterion, 2) robust calculation of covariance matrix to be used in distance calculations, and 3) automatically determining the kernel bandwidth.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.