Accelerating the annotation of sparse named entities by dynamic sentence selection

Yoshimasa Tsuruoka,Jun'Ichi Tsujii,Sophia Ananiadou

doi:10.1186/1471-2105-9-s11-s8

Yoshimasa Tsuruoka, Jun'Ichi Tsujii + Show 1 more

Open Access

PDF Available

https://doi.org/10.1186/1471-2105-9-s11-s8

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundPrevious studies of named entity recognition have shown that a reasonable level of recognition accuracy can be achieved by using machine learning models such as conditional random fields or support vector machines. However, the lack of training data (i.e. annotated corpora) makes it difficult for machine learning-based named entity recognizers to be used in building practical information extraction systems.ResultsThis paper presents an active learning-like framework for reducing the human effort required to create named entity annotations in a corpus. In this framework, the annotation work is performed as an iterative and interactive process between the human annotator and a probabilistic named entity tagger. Unlike active learning, our framework aims to annotate all occurrences of the target named entities in the given corpus, so that the resulting annotations are free from the sampling bias which is inevitable in active learning approaches.ConclusionWe evaluate our framework by simulating the annotation process using two named entity corpora and show that our approach can reduce the number of sentences which need to be examined by the human annotator. The cost reduction achieved by the framework could be drastic when the target named entities are sparse.

Highlights

Previous studies of named entity recognition have shown that a reasonable level of recognition accuracy can be achieved by using machine learning models such as conditional random fields or support vector machines
Named entities play a central role in conveying important domain specific information in text, and good named entity recognizers are often required in building practical information extraction systems
Previous studies have shown that automatic named entity recognition can be performed with a reasonable level of accuracy by using various machine learning models such as support vector machines (SVMs) or conditional random fields (CRFs) [13]

Summary

Introduction

Previous studies of named entity recognition have shown that a reasonable level of recognition accuracy can be achieved by using machine learning models such as conditional random fields or support vector machines. The lack of training data (i.e. annotated corpora) makes it difficult for machine learning-based named entity recognizers to be used in building practical information extraction systems. Previous studies have shown that automatic named entity recognition can be performed with a reasonable level of accuracy by using various machine learning models such as support vector machines (SVMs) or conditional random fields (CRFs) [13]. The lack of annotated corpora, which are indispensable for training machine learning models, makes it difficult to broaden the scope of text mining applications. The effectiveness of active learning has been demonstrated in several natural language processing tasks including named entity recognition

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Nov 19, 2008
Citations: 22	License type: CC BY 2.0

R Discovery Prime

Accelerating the annotation of sparse named entities by dynamic sentence selection

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Optimizing Features in Active Machine Learning for Complex Qualitative Content Analysis
Jasy Suet Yan Liew ... Shichun Zhou
-
Jasy Suet Yan Liew, et. al.Jasy Suet Yan Liew ... Shichun Zhou
01 Jan 2014
01 Jan 2014

The unreasonable effectiveness of machine learning in Moldavian versus Romanian dialect identification
Mihaela Găman ... Radu Tudor Ionescu
International Journal of Intelligent Systems | VOL. 37
Mihaela Găman, et. al.Mihaela Găman ... Radu Tudor Ionescu
17 Nov 2021
International Journal of Intelligent Systems | VOL. 37

Take Expert Advice Judiciously: Combining Groupwise Calibrated Model Probabilities with Expert Predictions
Sumeet Gupta ... Pao-Ann Hsiung
-
Sumeet Gupta, et. al.Sumeet Gupta ... Pao-Ann Hsiung
28 Sep 2023
28 Sep 2023

Active Learning with a Human in The Loop
Robyn Kozierok ... Seamus Clancy
-
Robyn Kozierok, et. al.Robyn Kozierok ... Seamus Clancy
01 Nov 2012
01 Nov 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Accelerating the annotation of sparse named entities by dynamic sentence selection

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Bioinformatics