一种新的谱聚类算法及其仿真

娇 李

doi:10.12677/csa.2017.711125

Abstract

针对传统谱聚类基于欧氏距离度量样本之间的相似性，不能反映样本的概率分布特性，特别是具有多峰分布的样本聚类，欧氏距离具有较大的偏差，另外在传统谱聚类中广泛使用的k-means算法由于初始中心的随机性设置，经常会产生不稳定的聚类结果。本文针对相似性度量问题提出了一种能够反映样本概率分布特性的相似性度量方法，并在此基础上对谱聚类中的k-means算法进行了改进，给出了一种对初始中心进行自适应设置的方法。实验结果表明，该算法相比传统的谱聚类在人工数据集与UCI真实数据集上具有更好的聚类效果。 The similarity between the samples based on Euclidean distance measurement can not reflect the probability distribution characteristics of the samples, especially the sample clustering with multi-peak distribution, and the Euclidean distance has a large deviation. In addition, the k-means algorithm widely used in the class often produces unstable clustering results due to the randomness of the initial center. In this paper, a similarity measure method which can reflect the probability distribution of the sample is proposed for the similarity measure. On this basis, the k-means algorithm in spectral clustering is improved, and a method is proposed for the initial center adaptive setting method. The experimental results show that the proposed algorithm has better clustering effect than the traditional spectral clustering in the artificial data set and the UCI real data set.

Full Text