Aggregating and Predicting Sequence Labels from Crowd Annotations.

An Thanh Nguyen,Byron Wallace,Matthew Lease,Junyi Jessy Li,Ani Nenkova

doi:10.18653/v1/p17-1028

Abstract

Despite sequences being core to NLP, scant work has considered how to handle noisy sequence labels from multiple annotators for the same text. Given such annotations, we consider two complementary tasks: (1) aggregating sequential crowd labels to infer a best single set of consensus annotations; and (2) using crowd annotations as training data for a model that can predict sequences in unannotated text. For aggregation, we propose a novel Hidden Markov Model variant. To predict sequences in unannotated text, we propose a neural approach using Long Short Term Memory. We evaluate a suite of methods across two different applications and text genres: Named-Entity Recognition in news articles and Information Extraction from biomedical abstracts. Results show improvement over strong baselines. Our source code and data are available online.

Highlights

Many important problems in Natural Language Processing (NLP) may be viewed as sequence labeling tasks, such as part-of-speech (PoS) tagging, named-entity recognition (NER), and Information Extraction (IE)
We find that principled combination of the “crowd component” with the “sequence component” yields strong improvement
Rodrigues et al (2014)’s Conditional Random Fields (CRFs)-MA achieves the highest Precision of all methods, but surprisingly the lowest F1

Summary

Introduction

Many important problems in Natural Language Processing (NLP) may be viewed as sequence labeling tasks, such as part-of-speech (PoS) tagging, named-entity recognition (NER), and Information Extraction (IE). As with other machine learning tasks, automatic sequence labeling typically requires annotated corpora on which to train predictive models. While such annotation was traditionally performed by domain experts, crowdsourcing has become a popular means to acquire large labeled datasets at lower cost, though annotations from laypeople may be lower quality than those from domain experts (Snow et al, 2008). One might want to induce a single set of high-quality consensus annotations for various purposes: (i) for direct use at run-time (when a given application requires human-level accuracy in identifying sequences); (ii) for sharing with others; or (iii) for training a predictive model. Given a training set of crowd labels, how can we best predict sequences in unannotated text? Should we: (i) consider Task 1 as a pre-processing step and train the model using consensus labels; or (ii) instead directly train the model on all of the individual annotations, as done by Yang et al (2010)? We investigate both directions in this work

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the conference. Association for Computational Linguistics. Meeting	Publication Date: Jan 1, 2017
Citations: 108	License type: cc-by

R Discovery Prime

R Discovery Prime

Aggregating and Predicting Sequence Labels from Crowd Annotations.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the conference. Association for Computational Linguistics. Meeting

Lead the way for us

Similar Papers

Adversarial Learning for Chinese NER From Crowd Annotations
Yaosheng Yang ... Meishan Zhang
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 32
Yaosheng Yang, et. al.Yaosheng Yang ... Meishan Zhang
25 Apr 2018
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 32

Named entity recognition for extracting concept in ontology building on Indonesian language using end-to-end bidirectional long short term memory
Joan Santoso ... Mauridhi Hery Purnomo
Expert Systems with Applications | VOL. 176
Joan Santoso, et. al.Joan Santoso ... Mauridhi Hery Purnomo
13 Mar 2021
Expert Systems with Applications | VOL. 176

Cross-Domain and Semisupervised Named Entity Recognition in Chinese Social Media: A Unified Model
Jingjing Xu ... Xuancheng Ren
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 26
Jingjing Xu, et. al.Jingjing Xu ... Xuancheng Ren
01 Nov 2018
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 26

The Organization Entity Extraction Telkom University Affiliated using Recurrent Neural Network (RNN)
Aditya Firman Ihsan ... Muhammad Daffa Regenta Sutrisno
Building of Informatics, Technology and Science (BITS) | VOL. 4
Aditya Firman Ihsan, et. al.Aditya Firman Ihsan ... Muhammad Daffa Regenta Sutrisno
21 Sep 2022
Building of Informatics, Technology and Science (BITS) | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Aggregating and Predicting Sequence Labels from Crowd Annotations.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the conference. Association for Computational Linguistics. Meeting