Abstract

The global scaling parameter σ may cause the typical spectral clustering algorithm NJW failing to discover the true clustering of a data set, especially it contains multiple scales. Although the Self-Turning spectral clustering algorithm can overcome the weakness of NJW by proposing the local scaling parameter σ_i for point i, the local scaling parameter may be affected by outliers. To avoid the deficiencies of the NJW and the Self-Tuning spectral clustering algorithms, the local standard deviation spectral clustering named as SCSD for short is proposed in this paper. The SCSD coined the local standard deviation scaling parameter σ_std_i via the standard deviation of point i with its top p nearest neighbors instead of the local scaling parameter σ_i of point i in Self-Tuning spectral clustering. As a consequence the affinity matrix in SCSD can reflect the original distribution of a data set as far as possible. The power of the proposed SCSD was tested on some benchmark data sets including the challenging synthetic data sets and the real world data sets from UCI machine learning repository, and on the synthetically generated comparative big data with noises. Its performance was compared with that of NJW and Self-Tuning in terms of the popular bench mark metrics including accuracy (Acc), Adjusted Mutual Information(AMI) and Adjusted Rand Index (ARI). The extensive experimental results demonstrate that the proposed SCSD spectral clustering algorithm is superior to NJW and Self-Tuning, and can find the true distribution of the data sets as far as possible and can be applied to detect the pattern of a big data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call