Abstract

Semi-supervised learning (SSL) that utilizes plenty of unlabeled examples to boost the performance of learning from limited labeled examples is a powerful learning paradigm with widely real-world applications such as information retrieval and document clustering. Label propagation (LP) is a popular SSL method which propagates labels through the dataset along high density areas defined by unlabeled examples, but it is fragile to bridge examples. Semi-supervised K-Means uses labeled examples to initialize clustering centers to separate different examples, however, semi-supervised K-Means fails in the situation of imbalanced issues, that is, the example size of each class varies significantly. This paper proposes a novel label propagated nonnegative matrix factorization method (LPNMF) to handle clean labeled but biased data and its extension LPNMF-E to handle noisy labeled data based on the framework of NMF. LPNMF decomposes the whole dataset into the product of a basis matrix and a coefficient matrix. To propagate labels to unlabeled examples, LPNMF regards the class indicators of labeled examples as their coefficients and iteratively updates both basis matrix and coefficients of unlabeled examples. LPNMF absorbs the merits from both semi-supervised K-Means and label propagation to handle their respective shortages. Specifically, on the one hand, LPNMF learns representative clustering centers based on the distribution of the dataset, similar to semi-supervised K-means, and thus is robust to the bridge examples. On the other hand, LPNMF pushes labels according to the affinity between examples, similar to label propagation, and thus relieves the biased problem. Moreover, we introduce a LPNMF extension to handle the noisy label case. LPNMF-E relaxes the constraint of labeled examples. Since the label of each labeled example also obtains label information from the global distribution of the whole dataset and local manifold of its neighbors, LPNMF-E outputs reliable class indicators even if a portion of examples are incorrectly labeled. Theoretical analyses for the generalization ability of our proposed models are also provided. Experimental results on both clean and noisy labeled datasets confirm the effectiveness of LPNMF and LPNMF-E compared with both LP and the representative semi-supervised K-Means algorithms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.