Abstract

In this paper, we propose confidence-weighted safe semi-supervised clustering where prior knowledge is given in the form of class labels. In some applications, some samples may be wrongly labeled by the users. Therefore, our basic idea is that different samples should have different impacts or confidences on the clustering performance. In our algorithm, we firstly use unsupervised clustering to perform the dataset partition and compute the normalized confusion matrix Nc. Nc is used to estimate the safe confidence of each labeled sample based on the assumption that a correctly clustered sample should have a high confidence. Then we construct a local graph to model the relationship between the labeled and its nearest unlabeled samples through the clustering results. Finally, a confidence-weighted fidelity term and a graph-based regularization term are incorporated into the objective function of unsupervised clustering. In this case, on the one hand, the outputs of the labeled samples with high confidences are restricted to be the given prior labels. On the other hand, the outputs of the labeled ones with low confidences are forced to approach those of the local homogeneous unlabeled neighbors modeled by the local graph. Hence, the labeled samples are expected to be safely exploited which is the goal of safe semi-supervised clustering. To verify the effectiveness of our algorithm, we carry out some experiments over several datasets by comparison to the unsupervised and semi-supervised clustering methods and achieve the promising results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call