Reducing Confusion in Active Learning for Part-Of-Speech Tagging

Aditi Chaudhary,Zaid Sheikh,Graham Neubig,Antonios Anastasopoulos

doi:10.1162/tacl_a_00350

Abstract

Active learning (AL) uses a data selection algorithm to select useful training samples to minimize annotation cost. This is now an essential tool for building low-resource syntactic analyzers such as part-of-speech (POS) taggers. Existing AL heuristics are generally designed on the principle of selecting uncertain yet representative training instances, where annotating these instances may reduce a large number of errors. However, in an empirical study across six typologically diverse languages (German, Swedish, Galician, North Sami, Persian, and Ukrainian), we found the surprising result that even in an oracle scenario where we know the true uncertainty of predictions, these current heuristics are far from optimal. Based on this analysis, we pose the problem of AL as selecting instances that maximally reduce the confusion between particular pairs of output tags. Extensive experimentation on the aforementioned languages shows that our proposed AL strategy outperforms other AL strategies by a significant margin. We also present auxiliary results demonstrating the importance of proper calibration of models, which we ensure through cross-view training, and analysis demonstrating how our proposed strategy selects examples that more closely follow the oracle data distribution. The code is publicly released here. 1

Highlights

Part-of-speech (POS) tagging is a crucial step for language understanding, both being used in automatic language understanding applications such as named entity recognition (NER; Ankita and Nazeer, 2018) and question answering (QA; Wang et al, 2018), and being used in manual lan
With the help of a senior Griko linguist (Linguist3), we identified a few types of conjunctions that are always coordinating: variations of ‘‘and’’, and of ‘‘or’’ (e or i)
We have presented a novel active learning method for low-resource POS tagging that works by reducing confusion between output tags

Summary

Introduction

Part-of-speech (POS) tagging is a crucial step for language understanding, both being used in automatic language understanding applications such as named entity recognition (NER; Ankita and Nazeer, 2018) and question answering (QA; Wang et al, 2018), and being used in manual lan-. Because we would like to correct errors where tokens with true labels of DET are mislabeled by the model as PRO, asking the human annotator to tag an instance with a true label of PRO, even if it is uncertain, is not likely to be of much benefit. Inspired by this observation, we pose the problem of AL for POS tagging as selecting tokens that maximally reduce the confusion between the output tags. We collect 300 new token-level annotations which will help further Griko NLP

Background

Confusion-Reducing Active Learning

Model Architecture

Cross-view Training Regimen

Cross-Lingual Transfer Learning

Simulation Experiments

Analysis

Oracle Results

Effect of Cross-View Training

Human Annotation Experiment

Results

Related Work

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Feb 1, 2021
Citations: 17	License type: cc-by

R Discovery Prime

R Discovery Prime

Reducing Confusion in Active Learning for Part-Of-Speech Tagging

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

Reducing Confusion in Active Learning for Part-Of-Speech Tagging
...
-
, et. al. ...
25 May 2021
25 May 2021

Integrating Multi-Source Transfer Learning, Active Learning and Metric Learning paradigms for Time Series Prediction
Qitao Gu ... Rui Ye
Applied Soft Computing Journal | VOL. 109
Qitao Gu, et. al.Qitao Gu ... Rui Ye
07 Jun 2021
Applied Soft Computing Journal | VOL. 109

Exploring the impacts of interactions, social presence and emotional engagement on active collaborative learning in a social web-based environment
Sebastian Molinillo ... María Vallespín-Arán
Computers & Education | VOL. 123
Sebastian Molinillo, et. al.Sebastian Molinillo ... María Vallespín-Arán
03 May 2018
Computers & Education | VOL. 123

Active Blended Learning
Alejandro Armellini ... Brenda Cecilia Padilla Rodriguez
-
Alejandro Armellini, et. al.Alejandro Armellini ... Brenda Cecilia Padilla Rodriguez
12 Feb 2021
12 Feb 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reducing Confusion in Active Learning for Part-Of-Speech Tagging

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics