Abstract
Pairwise constraints are a typical form of class information used in semi-supervised clustering. Although various methods were proposed to combine unlabeled data with pairwise constraints, most of them rely on adapting existing clustering frameworks, such as GMM or k-means, to semi-supervised setting. In consequence, pairwise relations have to be transferred into particular clustering model, which is often contradictory with expert knowledge.In this paper we propose a novel semi-supervised method, d-graph, which does not assume any predefined structure of clusters. We follow a discriminative approach and use logistic function to directly model posterior probabilities p(k|x) that point x belongs to kth cluster. Making use of these posterior probabilities we maximize the expected probability that pairwise constraints are preserved. To include unlabeled data in our clustering objective function, we introduce additional pairwise constraints so that nearby points are more likely to appear in the same cluster. The proposed model can be easily optimized with the use of gradient techniques and kernelized, which allows to discover arbitrary shapes and structures in data. The experimental results performed on various types of data demonstrate that d-graph obtains better clustering results than comparative state-of-the-art methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.