Abstract
Semi-supervised clustering tries to surpass the limits of unsupervised clustering using extra information contained in occasional labeled data points. However, providing such labeled samples is not always possible or easy in real world applications. A weaker, yet still very useful option is providing constraints on the unlabeled training samples, which is the focus of the Constrained Semi-Supervised (CSS) clustering. On the other hand, online learning has gained considerable amount of interests in real world problems with massive sample size or streaming behavior, as lack of memory and computational resources seriously restrict the application of the offline and batch methods. However, the existing algorithms for online CSS clustering problem either assumed that the entire dataset is available and added constraints incrementally or considered chunks of constrained data points and applied an offline CSS clustering algorithm. Thus, none of them can be categorized as a genuine online CSS clustering algorithm. In this paper, we propose CS2GS, an online CSS clustering algorithm. CS2GS is constructed by modifying the online learning process of Semi-Supervised Growing Self-Organizing Map, and converting it to an iterative constrained metric learning problem that can be solved using the Bregman׳s iterative projections. The proposed CS2GS is studied via a series of thorough tests using synthetic and real data including selections from UCI datasets and FEP – a recent bilingual corpus used for sentence aligning stage of machine translation. Experimental results show the effectiveness of CS2GS in online CSS clustering, and prove that indeed, the limits of the system accuracy may be pushed higher using unlabeled samples.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.