ALDANER: Active Learning based Data Augmentation for Named Entity Recognition

Vincenzo Moscato,Marco Postiglione,Giancarlo Sperlì,Andrea Vignali

doi:10.1016/j.knosys.2024.112682

Abstract

Training Named Entity Recognition (NER) models typically necessitates the use of extensively annotated datasets. This requirement presents a significant challenge due to the labor-intensive and costly nature of manual annotation, especially in specialized domains such as medicine and finance. To address data scarcity, two strategies have emerged as effective: (1) Active Learning (AL), which autonomously identifies samples that would most enhance model performance if annotated, and (2) data augmentation, which automatically generates new samples. However, while AL reduces human effort, it does not eliminate it entirely, and data augmentation often leads to incomplete and noisy annotations, presenting new hurdles in NER model training. In this study, we integrate AL principles into a data augmentation framework, named Active Learning-based Data Augmentation for NER (ALDANER), to prioritize the selection of informative samples from an augmented pool and mitigate the impact of noisy annotations. Our experiments across various benchmark datasets and few-shot scenarios demonstrate that our approach surpasses several data augmentation baselines, offering insights into promising avenues for future research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ALDANER: Active Learning based Data Augmentation for Named Entity Recognition

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems

Lead the way for us

Journal: Knowledge-Based Systems	Publication Date: Dec 1, 2024
License type: cc-by-nc-nd

Similar Papers

A deep active learning-based and crowdsourcing-assisted solution for named entity recognition in Chinese historical corpora
Chengxi Yan ... Jun Wang
Aslib Journal of Information Management | VOL. 75
Chengxi Yan, et. al.Chengxi Yan ... Jun Wang
13 Dec 2022
Aslib Journal of Information Management | VOL. 75

D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information.
Thanh Hai Dang ... Hoang-Quynh Le
Bioinformatics | VOL. 34
Thanh Hai Dang, et. al.Thanh Hai Dang ... Hoang-Quynh Le
30 Apr 2018
Bioinformatics | VOL. 34

Data Augmentation Algorithms for Detecting Conserved Domains in Protein Sequences: A Comparative Study
Chengpeng Bi
Journal of Proteome Research | VOL. 7
Chengpeng BiChengpeng Bi
15 Dec 2007
Journal of Proteome Research | VOL. 7

Evaluation on Network Social Media Named Entity Recognition Model Based on Active Learning
Guijiao He ... Yunfeng Zhou
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 23
Guijiao He, et. al.Guijiao He ... Yunfeng Zhou
07 Aug 2024
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ALDANER: Active Learning based Data Augmentation for Named Entity Recognition

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems