Abstract

Representation learning has been proven to play an important role in the unprecedented success of machine learning models in numerous tasks, such as machine translation, face recognition and recommendation. The majority of existing representation learning approaches often require large amounts of consistent and noise-free labels. However, labels are very limited in many real-world scenarios. Directly applying standard representation learning approaches on small labeled data sets will easily run into over-fitting problems and lead to sub-optimal solutions. Even worse, the limited labels are usually annotated by multiple workers with diverse expertise, which yields noises and inconsistency in such crowdsourced labels. In this paper, we propose a novel framework which aims to learn effective representations from limited data with crowdsourced labels. We design a grouping based deep neural network to learn embeddings from limited training samples and present a Bayesian confidence estimator to capture the inconsistency among crowdsourced labels. Furthermore, we develop a hard example selection procedure to adaptively pick up training examples that are being misclassified by the current version of the model. Extensive experiments conducted on three real-world educational data sets demonstrate the superiority of our framework on learning representations from limited data with crowdsourced labels.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call