Abstract
Noisy labeled data represent a rich source of information that often are easily accessible and cheap to obtain, but label noise might also have many negative consequences if not accounted for. How to fully utilize noisy labels has been studied extensively within the framework of standard supervised machine learning over a period of several decades. However, very little research has been conducted on solving the challenge posed by noisy labels in non-standard settings. This includes situations where only a fraction of the samples are labeled (semi-supervised) and each high-dimensional sample is associated with multiple labels. In this work, we present a novel semi-supervised and multi-label dimensionality reduction method that effectively utilizes information from both noisy multi-labels and unlabeled data. With the proposed Noisy multi-label semi-supervised dimensionality reduction (NMLSDR) method, the noisy multi-labels are denoised and unlabeled data are labeled simultaneously via a specially designed label propagation algorithm. NMLSDR then learns a projection matrix for reducing the dimensionality by maximizing the dependence between the enlarged and denoised multi-label space and the features in the projected space. Extensive experiments on synthetic data, benchmark datasets, as well as a real-world case study, demonstrate the effectiveness of the proposed algorithm and show that it outperforms state-of-the-art multi-label feature extraction algorithms.
Highlights
Supervised machine learning crucially relies on the accuracy of the observed labels associated with the training samples [1,2,3,4,5,6,7,8,9,10]
It can be seen that the classes are better separated and more compact in the Noisy multi-label semi-supervised dimensionality reduction (NMLSDR) embedding than the supervised multi-label dimensionality reduction (SSMLDR) embedding
In this paper we have introduced the NMLSDR method, a dimensionality reduction method for partially and noisy labeled multi-label data
Summary
Supervised machine learning crucially relies on the accuracy of the observed labels associated with the training samples [1,2,3,4,5,6,7,8,9,10]. Observed labels may be corrupted and, they do not necessarily coincide with the true class of the samples. Such inaccurate labels are referred to as noisy [2, 4, 11]. Noisy labels may result from the use of frameworks such as anchor learning [12, 13] or silver standard learning [14], which have received interest for instance in healthcare analytics [15, 16]. A review of various sources of label noise can be found in [2]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.