In this paper, we propose a safe semi-supervised clustering algorithm based on Dempster–Shafer (D–S) evidence theory. The motivation is that D–S evidence theory can be used to fuse multiple base clustering results and obtain robust confidence estimations of mislabeled samples. Firstly, the proposed algorithm constructs multiple base clusters using fuzzy c-means and kernel fuzzy c-means. Base clusters with good performance are selected according to a clustering validity function. Then, D–S evidence theory is used to fuse the results of the selected base clusters, and the confidence of labeled samples is calculated based on the fused results. Finally, we construct a p-nearest neighbor graph to limit the outputs of labeled samples with low confidence to be those of the p nearest unlabeled samples. It is desired to reduce the negative influence of labeled samples with low confidence and achieve safe exploitation. To verify the effectiveness of the proposed algorithm, we compare it to several unsupervised and semi-supervised clustering algorithms. The results demonstrate that our algorithm yields higher accuracy and is more stable.
Read full abstract