Abstract

AbstractSide information such as pairwise constraints is useful to improve the clustering performance in general. However, constraints are not always error free in general. When erroneous constraints are specified as side information, treating them as hard constraints could have the disadvantage since strengthening incorrect or erroneous constraints can lead to performance degradation. In this paper we conduct extensive experiments to investigate the influence of erroneous pairwise constraints over various document datasets. Several state-of-the-art semi-supervised clustering methods with graph representation were evaluated with respect to the type of constraints as well as the number of constraints. Experimental results confirmed that treating pairwise constraints as hard constraints is vulnerable to erroneous ones. However, the results also revealed that the influence of erroneous constraints depends on how the constraints are exploited inside a learning algorithm.KeywordsSpectral ClusterSide InformationSoft ConstraintKernel MatrixHard ConstraintThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call