Abstract

The success of NLP research is founded on high-quality annotated datasets, which are usually obtained from multiple expert annotators or crowd workers. The standard practice to training machine learning models is to first adjudicate the disagreements and then perform the training. To this end, there has been a lot of work on aggregating annotations, particularly for classification tasks. However, many other tasks, particularly in NLP, have unique characteristics not considered by standard models of annotation, e.g., label interdependencies in sequence labelling tasks, unrestricted labels for anaphoric annotation, or preference labels for ranking texts. In recent years, researchers have picked up on this and are covering the gap. A first objective of this tutorial is to connect NLP researchers with state-of-the-art aggregation models for a diverse set of canonical language annotation tasks. There is also a growing body of recent work arguing that following the convention and training with adjudicated labels ignores any uncertainty the labellers had in their classifications, which results in models with poorer generalisation capabilities. Therefore, a second objective of this tutorial is to teach NLP workers how they can augment their (deep) neural models to learn from data with multiple interpretations.

Highlights

  • Introduction to the fieldShortcomings of early practices.2

  • Including the coders’ disagreements in the learning signal offers the models a richer source of information compared to adjudicated labels: they include the consensus, but may indicate ambiguity, and how the humans make mistakes. This improves the generalisation capability of the models and offers them a more graceful degradation with less ridiculous mistakes (Peterson et al, 2019; Guan et al, 2018). Some of these approaches can be used for their noise distillation capabilities, as their learning processes produce aggregated labels that leverage coder annotation patterns and the knowledge of the task accumulated by the model (Cao et al, 2018; Rodrigues and Pereira, 2018; Albarqouni et al, 2016; Chu et al, 2020)

  • We show how to reformulate NLP tasks with ambiguous categories or scores as preference learning, giving an example applications related to argument persuasiveness

Read more

Summary

Description

The disagreement between annotators stems from ambiguous or subjective annotation tasks as well as annotator errors. This improves the generalisation capability of the models and offers them a more graceful degradation with less ridiculous mistakes (Peterson et al, 2019; Guan et al, 2018) Some of these approaches can be used for their noise distillation capabilities, as their learning processes produce aggregated labels that leverage coder annotation patterns and the knowledge of the task accumulated by the model (Cao et al, 2018; Rodrigues and Pereira, 2018; Albarqouni et al, 2016; Chu et al, 2020). A second objective of the tutorial is to teach NLP researchers how they can augment their existing (deep) neural architectures to learn from data with disagreements

Learning outcomes
Part 1. Motivation and Early Approaches to Annotation Analysis
Part 2. Advanced Models of Annotation
Part 3. Learning with Multiple Annotators
Part 4. Practical Session
Audience prerequisites
Presenters
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call