Abstract

Learning effective language representations from crowdsourced labels is crucial for many real-world machine learning tasks. A challenging aspect of this problem is that the quality of crowdsourced labels suffer high intra- and inter-observer variability. Since the high-capacity deep neural networks can easily memorize all disagreements among crowdsourced labels, directly applying existing supervised language representation learning algorithms may yield suboptimal solutions. In this paper, we propose TACMA, a temporal-aware language representation learning heuristic for crowdsourced labels with multiple annotators. The proposed approach (1) explicitly models the intra-observer variability with attention mechanism; (2) computes and aggregates per-sample confidence scores from multiple workers to address the inter-observer disagreements. The proposed heuristic is extremely easy to implement in around 5 lines of code. The proposed heuristic is evaluated on four synthetic and four real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage the reproducible results, we make our code publicly available at https://github.com/CrowdsourcingMining/TACMA.

Highlights

  • Crowdsourcing offers the ability to utilize the power of human computation to generate data annotations that are needed to train various AI systems

  • With the recent advances of deep neural networks (DNNs), supervised representation learning (SRL) has led to rapid improvements in the ability of learning intrinsic nonlinear embeddings using DNNs that preserves the distance between similar examples close and dissimilar examples far on the embedding space

  • Our work focuses on the refinements of a popular deep language representation learning paradigm: the deep metric learning (DML) (Koch et al, 2015; Xu et al, 2019; Wang et al, 2020b)

Read more

Summary

Introduction

Crowdsourcing offers the ability to utilize the power of human computation to generate data annotations that are needed to train various AI systems. For many practical supervised learning applications, it may be infeasible (or very expensive) to obtain objective and reliable labels due to many reasons such as varying skill-levels and biases of crowdsourced workers. In spite of the significant progress for SRL applications such as face recognition (Schroff et al, 2015), image retrieval (Xia et al, 2014), directly applying existing deep language representation learning approaches on crowdsourced labels may yield poor generalization performance (Han et al, 2018). Because of the high capacity, DNNs could entirely memorize the inconsistency within crowdsourced labels sooner or later during the modeling training process. This phenomenon does not change with the choice of training optimizations or network architectures (Han et al, 2018)

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call