Abstract

A semi-supervised agglomerative hierarchical clustering method based on dynamically updating constraints is proposing in this research. Following the existing semi-supervised clustering algorithm, this method uses the must-link and cannot-link constraints. Instead of using the idea that the instances with must-link constraints are pre-clustered before agglomerating with the others, this method employs a more general and reasonable process. Firstly, must-link and cannot-link constraints are expanded to compose a constraints closure. Then, a standard agglomeration instructed by cannot-link constraints is processed. During this procedure, the must-link and cannot-link are dynamically updated according to the intermediate clustering results. This updating process guarantees the validity of the final results. The fundamental advantage of this method is omitting the pre-clustering process of the instances with must-link constraints. This modification ensures that data points gain a more reasonable agglomeration order, which may result in a significant improvement on the clustering results. This research also introduces an implementation of this model based on Ward0s method, leading to the C-Ward algorithm. The experimental analyses on both Artificial simulated datasets and real world datasets show that this method is much better than the others.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call