Minimizing manual annotation cost in supervised training from corpora

Sean P Engelson,Ido Dagan

doi:10.3115/981863.981905

Abstract

Corpus-based methods for natural language processing often use supervised training, requiring expensive manual annotation of training corpora. This paper investigates methods for reducing annotation cost by sample selection. In this approach, during training the learning program examines many unlabeled examples and selects for labeling (annotation) only those that are most informative at each stage. This avoids redundantly annotating examples that contribute little new information. This paper extends our previous work on committee-based sample selection for probabilistic classifiers. We describe a family of methods for committee-based sample selection, and report experimental results for the task of stochastic part-of-speech tagging. We find that all variants achieve a significant reduction in annotation cost, though their computational efficiency differs. In particular, the simplest method, which has no parameters to tune, gives excellent results. We also show that sample selection yields a significant reduction in the size of the model used by the tagger.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Minimizing manual annotation cost in supervised training from corpora

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Committee-Based Sample Selection for Probabilistic Classifiers
S. Argamon-Engelson ... I. Dagan
Journal of Artificial Intelligence Research | VOL. 11
S. Argamon-Engelson, et. al.S. Argamon-Engelson ... I. Dagan
15 Nov 1999
Journal of Artificial Intelligence Research | VOL. 11

Sample selection in natural language learning
Sean P Engelson ... Ido Dagan
-
Sean P Engelson, et. al.Sean P Engelson ... Ido Dagan
01 Jan 1996
01 Jan 1996

Review of classical dimensionality reduction and sample selection methods for large-scale data processing
Xinzheng Xu ... Tianming Liang
Neurocomputing | VOL. 328
Xinzheng Xu, et. al.Xinzheng Xu ... Tianming Liang
17 Aug 2018
Neurocomputing | VOL. 328

Linguistic Approach to Semantic Correlation Rules
Charlotte Effenberger ... G Fragulis
SHS Web of Conferences | VOL. 102
Charlotte Effenberger, et. al.Charlotte Effenberger ... G Fragulis
01 Jan 2020
SHS Web of Conferences | VOL. 102

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Minimizing manual annotation cost in supervised training from corpora

Abstract

Talk to us

Similar Papers