DANSK: Domain Generalization of Danish Named Entity Recognition

Kenneth Enevoldsen,Emil Trenckner Jessen,Rebekah Baglini

doi:10.3384/nejlt.2000-1533.2024.5249

Kenneth Enevoldsen, Emil Trenckner Jessen + Show 1 more

Open Access

https://doi.org/10.3384/nejlt.2000-1533.2024.5249

Copy DOI

Abstract

Named entity recognition is an important application within Danish NLP, essential within both industry and research. However, Danish NER is inhibited by a lack coverage across domains and entity types. As a consequence, no current models are capable of fine-grained named entity recognition, nor have they been evaluated for potential generalizability issues across datasets and domains. To alleviate these limitations, this paper introduces: 1) DANSK: a named entity dataset providing for high-granularity tagging as well as within-domain evaluation of models across a diverse set of domains; 2) and three generalizable models with fine-grained annotation available in DaCy 2.6.0; and 3) an evaluation of current state-of-the-art models’ ability to generalize across domains. The evaluation of existing and new models revealed notable performance discrepancies across domains, which should be addressed within the field. Shortcomings of the annotation quality of the dataset and its impact on model training and evaluation are also discussed. Despite these limitations, we advocate for the use of the new dataset DANSK alongside further work ongeneralizability within Danish NER.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DANSK: Domain Generalization of Danish Named Entity Recognition

Abstract

Talk to us

Similar Papers

More From: Northern European Journal of Language Technology

Lead the way for us

Journal: Northern European Journal of Language Technology	Publication Date: Jul 23, 2024
License type: CC BY 4.0

Similar Papers

TaggerOne: joint named entity recognition and normalization with semi-Markov Models.
Robert Leaman ... Zhiyong Lu
Bioinformatics | VOL. 32
Robert Leaman, et. al.Robert Leaman ... Zhiyong Lu
09 Jun 2016
Bioinformatics | VOL. 32

An End-To-End NER Model with Explicit Boundary and Type Information
Ying Feng ... Zhe Chen
Journal of Physics: Conference Series | VOL. 2337
Ying Feng, et. al.Ying Feng ... Zhe Chen
01 Sep 2022
Journal of Physics: Conference Series | VOL. 2337

Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine
Tingting Zhang ... Yaqiang Wang
BMC Medical Informatics and Decision Making | VOL. 20
Tingting Zhang, et. al.Tingting Zhang ... Yaqiang Wang
06 Apr 2020
BMC Medical Informatics and Decision Making | VOL. 20

Terminologies augmented recurrent neural network model for clinical named entity recognition.
Ivan Lerner ... Xavier Tannier
Journal of Biomedical Informatics | VOL. 102
Ivan Lerner, et. al.Ivan Lerner ... Xavier Tannier
16 Dec 2019
Journal of Biomedical Informatics | VOL. 102

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DANSK: Domain Generalization of Danish Named Entity Recognition

Abstract

Talk to us

Similar Papers

More From: Northern European Journal of Language Technology