Joined Type Length Encoding for Nested Named Entity Recognition

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

In this article, we propose a new encoding scheme for named entity recognition (NER) called Joined Type-Length encoding (JoinedTL). Unlike most existing named entity encoding schemes, which focus on flat entities, JoinedTL can label nested named entities in a single sequence. JoinedTL uses a packed encoding to represent both type and span of a named entity, which not only results in less tagged tokens compared to existing encoding schemes, but also enables it to support nested NER. We evaluate the effectiveness of JoinedTL for nested NER on three nested NER datasets: GENIA in English, GermEval in German, and PerNest, our newly created nested NER dataset in Persian. We apply CharLSTM+WordLSTM+CRF, a three-layer sequence tagging model on three datasets encoded using JoinedTL and two existing nested NE encoding schemes, i.e., JoinedBIO and JoinedBILOU. Our experiment results show that CharLSTM+WordLSTM+CRF trained with JoinedTL encoded datasets can achieve competitive F1 scores as the ones trained with datasets encoded by two other encodings, but with 27%–48% less tagged tokens. To leverage the power of three different encodings, i.e., JoinedTL, JoinedBIO, and JoinedBILOU, we propose an encoding-based ensemble method for nested NER. Evaluation results show that the ensemble method achieves higher F1 scores on all datasets than the three models each trained using one of the three encodings. By using nested NE encodings including JoinedTL with CharLSTM+WordLSTM+CRF, we establish new state-of-the-art performance with an F1 score of 83.7 on PerNest, 74.9 on GENIA, and 70.5 on GermEval, surpassing two recent neural models specially designed for nested NER.

Similar Papers
  • Research Article
  • Cite Count Icon 6
  • 10.1007/s40747-024-01518-9
Joint entity and relation extraction combined with multi-module feature information enhancement
  • Jun 16, 2024
  • Complex & Intelligent Systems
  • Yao Li + 3 more

The proposed method for joint entity and relation extraction integrates the tasks of entity extraction and relation classification by sharing the encoding layer. However, the method faces challenges due to incongruities in the contextual information captured by these subtasks, resulting in potential feature conflicts and adverse effects on model performance. To address this, we introduced a novel joint entity and relation extraction method that incorporates multi-module feature information enhancement (MFIE) (https://github.com/liyao345496280/Relation-extraction). We employ a relation awareness enhancement module for the entity extraction task, which directs the model’s focus towards extracting entities closely related to potential relations using a potential relation extraction module and an attention mechanism. For the relation extraction task, we implement an entity information enhancement module that uses entity extraction results to augment the original feature information through a gating mechanism, thereby enhancing relation classification performance. Experiments on the NYT and WebNLG datasets demonstrate that our method performs well. Compared to the state-of-the-art method, the F1 score on the NYT dataset improved by 0.7%.

  • Research Article
  • 10.1016/j.compbiomed.2025.110964
Manual annotation of Robson criteria and obstetric entities: Inter-annotator agreement and initial NER models implementation.
  • Oct 1, 2025
  • Computers in biology and medicine
  • Orlando Ramos-Flores + 9 more

Manual annotation of Robson criteria and obstetric entities: Inter-annotator agreement and initial NER models implementation.

  • Research Article
  • Cite Count Icon 73
  • 10.1145/3497842
Can BERT Dig It? Named Entity Recognition for Information Retrieval in the Archaeology Domain
  • Sep 16, 2022
  • Journal on Computing and Cultural Heritage
  • Alex Brandsen + 3 more

The amount of archaeological literature is growing rapidly. Until recently, these data were only accessible through metadata search. We implemented a text retrieval engine for a large archaeological text collection (~658 million words). In archaeological IR, domain-specific entities such as locations, time periods and artefacts play a central role. This motivated the development of a named entity recognition (NER) model to annotate the full collection with archaeological named entities. In this article, we present ArcheoBERTje, a BERT (Bidirectional Encoder Representations from Transformers) model pre-trained on Dutch archaeological texts. We compare the model’s quality and output on an NER task to a generic multilingual model and a generic Dutch model. We also investigate ensemble methods for combining multiple BERT models, and combining the best BERT model with a domain thesaurus using conditional random fields. We find that ArcheoBERTje outperforms both the multilingual and Dutch model significantly with a smaller standard deviation between runs, reaching an average F1 score of 0.735. The model also outperforms ensemble methods combining the three models. Combining ArcheoBERTje predictions and explicit domain knowledge from the thesaurus did not increase the F1 score. We quantitatively and qualitatively analyse the differences between the vocabulary and output of the BERT models on the full collection and provide some valuable insights in the effect of fine-tuning for specific domains. Our results indicate that for a highly specific text domain such as archaeology, further pre-training on domain-specific data increases the model’s quality on NER by a much larger margin than shown for other domains in the literature, and that domain-specific pre-training makes the addition of domain knowledge from a thesaurus unnecessary.

  • Research Article
  • Cite Count Icon 17
  • 10.1186/s12859-021-04236-y
Improving deep learning method for biomedical named entity recognition by using entity definition information
  • Dec 1, 2021
  • BMC Bioinformatics
  • Ying Xiong + 6 more

BackgroundBiomedical named entity recognition (NER) is a fundamental task of biomedical text mining that finds the boundaries of entity mentions in biomedical text and determines their entity type. To accelerate the development of biomedical NER techniques in Spanish, the PharmaCoNER organizers launched a competition to recognize pharmacological substances, compounds, and proteins. Biomedical NER is usually recognized as a sequence labeling task, and almost all state-of-the-art sequence labeling methods ignore the meaning of different entity types. In this paper, we investigate some methods to introduce the meaning of entity types in deep learning methods for biomedical NER and apply them to the PharmaCoNER 2019 challenge. The meaning of each entity type is represented by its definition information.Material and methodWe investigate how to use entity definition information in the following two methods: (1) SQuad-style machine reading comprehension (MRC) methods that treat entity definition information as query and biomedical text as context and predict answer spans as entities. (2) Span-level one-pass (SOne) methods that predict entity spans of one type by one type and introduce entity type meaning, which is represented by entity definition information. All models are trained and tested on the PharmaCoNER 2019 corpus, and their performance is evaluated by strict micro-average precision, recall, and F1-score.ResultsEntity definition information brings improvements to both SQuad-style MRC and SOne methods by about 0.003 in micro-averaged F1-score. The SQuad-style MRC model using entity definition information as query achieves the best performance with a micro-averaged precision of 0.9225, a recall of 0.9050, and an F1-score of 0.9137, respectively. It outperforms the best model of the PharmaCoNER 2019 challenge by 0.0032 in F1-score. Compared with the state-of-the-art model without using manually-crafted features, our model obtains a 1% improvement in F1-score, which is significant. These results indicate that entity definition information is useful for deep learning methods on biomedical NER.ConclusionOur entity definition information enhanced models achieve the state-of-the-art micro-average F1 score of 0.9137, which implies that entity definition information has a positive impact on biomedical NER detection. In the future, we will explore more entity definition information from knowledge graph.

  • Research Article
  • 10.1609/aaai.v39i23.34591
GuideNER: Annotation Guidelines Are Better than Examples for In-Context Named Entity Recognition
  • Apr 11, 2025
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Shizhou Huang + 4 more

Large language models (LLMs) demonstrate impressive performance on downstream tasks through in-context learning(ICL). However, there is a significant gap between their performance in Named Entity Recognition (NER) and in fine-tuning methods. We believe this discrepancy is due to inconsistencies in labeling definitions in NER. In addition, recent research indicates that LLMs do not learn the specific input-label mappings from the demonstrations. Therefore, we argue that using examples to implicitly capture the mapping between inputs and labels in in-context learning is not suitable for NER. Instead, it requires explicitly informing the model of the range of entities contained in the labels, such as annotation guidelines. In this paper, we propose GuideNER, which uses LLMs to summarize concise annotation guidelines as contextual information in ICL. We have conducted experiments on widely used NER datasets, and the experimental results indicate that our method can consistently and significantly outperform state-of-the-art methods, while using shorter prompts. Especially on the GENIA dataset, our model outperforms the previous state-of-the-art model by 12.63 F1 scores.

  • Supplementary Content
  • Cite Count Icon 2
  • 10.3389/frai.2025.1584203
Artificial intelligence in healthcare text processing: a review applied to named entity recognition
  • Jul 7, 2025
  • Frontiers in Artificial Intelligence
  • Samuel Santana De Almeida + 11 more

ContextTraditional methods such as rule-based systems, word embeddings (e.g. Word2Vec, GloVe) and sequence tagging models such as CRFs and HMMs have difficulty capturing the complex and nuanced context of medical texts, leading to low precision and inflexibility. These methods also struggle with the inherent variability of medical language and often require large and difficult-to-obtain labeled datasets.ObjectiveWe examine the growing importance of Named Entity Recognition (NER) in the analysis of healthcare texts. NER, a fundamental technique in Natural Language Processing (NLP), automatically identifies and categorizes named entities in the text, such as names of people and organizations, in medical texts, medical conditions and drug names. This facilitates better information retrieval, personalized medicine approaches and clinical decision support systems.MethodsA systematic mapping was carried out that focused on advanced language models, specifically transformation-based models such as BERT. These models are known for capturing complex semantic dependencies and linguistic nuances, which are crucial for accurate processing of medical texts. Transformation architectures, unlike traditional techniques such as CNNs and RNNs, are better suited to dealing with the contextual and semantic nature of medical texts due to their ability to manage long sequences and the need for high precision.ResultsThe results indicate that transformation-based models, in particular BERT and its specialized variants (e.g. ClinicalBERT), consistently demonstrate high performance on NER tasks, with F1 scores often exceeding 97%, outperforming traditional and hybrid methods. When examining the geographical distribution of contributions, the research identifies a significant contribution from China, followed by the United States. These findings have crucial implications for the integration of NER technologies into the Brazilian National Health System (SUS).ConclusionThis systematic review contributes to the advancement of NER in health texts by evaluating methods, showing results and highlighting the wider implications for the field. The article is systematically structured into the following sections: Methodology, Bibliometric analysis, Results and discussion, Threats to validity, Future work and Conclusion. This systematic organization provides a comprehensive review of the research, its impact and future directions, highlighting the importance of keeping up to date with advances in the field to increase the relevance of NER applications in healthcare.

  • Addendum
  • Cite Count Icon 25
  • 10.1098/rsif.2020.0782
Corrigendum to ‘Deep learning improves taphonomic resolution: high accuracy in differentiating tooth marks made by lions and jaguars'
  • Oct 21, 2020
  • Journal of the Royal Society Interface
  • Blanca Jiménez-García + 4 more

Corrigendum to ‘Deep learning improves taphonomic resolution: high accuracy in differentiating tooth marks made by lions and jaguars'

  • Research Article
  • 10.1016/j.ijmedinf.2025.106230
Efficient medical NER with limited data: Enhancing LLM performance through annotation guidelines.
  • Mar 1, 2026
  • International journal of medical informatics
  • Emiko Shinohara + 1 more

Named entity recognition (NER) is critical in natural language processing (NLP), particularly in the medical field, where accurate identification of entities, such as patient information and clinical events, is essential. Traditional NER approaches rely heavily on large, annotated corpora, which are resource intensive. Large language models (LLMs) offer new NER approaches, particularly through in-context and few-shot learning. This study investigates the effects of incorporating annotation guidelines into prompts for NER via LLMs, with a specific focus on their impact on few-shot learning performance across various medical corpora. We designed eight different prompt patterns, combining few-shot examples with annotation guidelines of varying complexity, and evaluated their performance via three prominent LLMs: GPT-4o, Claude 3.5 Sonnet, and gpt-oss-120b. Additionally, we employed three diverse medical corpora: i2b2-2014, i2b2-2012, and MedTxt-CR. Accuracy was assessed via precision, recall, and the F1 score, with evaluation methods aligned with those used in relevant shared tasks to ensure the comparability of the results. Our findings indicate that adding detailed annotation guidelines to few-shot prompts improves the recall and F1 score in most cases. Including annotation guidelines in prompts enhances the performance of LLMs in NER tasks, making this a practical approach for developing accurate NLP systems in resource-constrained environments. Although annotation guidelines are essential for evaluation and example creation, their integration into LLM prompts can further optimize few-shot learning, especially within specialized domains such as medical NLP.

  • Research Article
  • Cite Count Icon 8
  • 10.1109/access.2019.2961118
Multilayer ToI Detection Approach for Nested NER
  • Jan 1, 2019
  • IEEE Access
  • Lin Sun + 3 more

Nested entities commonly exist in news articles and biomedical corpora. The performance of nested NER is still a great challenge in the field of named entity recognition (NER). Unlike the structural models in previous work, this paper presents a comprehensive study of nested NER by means of text-of-interest (ToI) detection. This paper presents a novel ToI-CNN with dual transformer encoders (ToI-CNN + DTE) model for this solution. We design a directional self-attention mechanism to encode contextual representation over the whole-sentence in the forward and backward directions. The features of the entities are extracted from the contextual token representations by a convolutional neural network. Moreover, we use HAT pooling operation to convert the various length ToIs to a fixed length vector and connect with a fully connected network for classification. The layer where the nested entities are located can be evaluated by multi-task learning jointly with layer classification. The experimental results show that our model achieves excellent performance in F1 score, training cost and layer evaluation on the nested NER datasets.

  • Book Chapter
  • Cite Count Icon 7
  • 10.1007/978-981-33-6162-1_2
MTNER: A Corpus for Mongolian Tourism Named Entity Recognition
  • Jan 1, 2020
  • Xiao Cheng + 3 more

Name Entity Recognition is the essential tool for machine translation. Traditional Named Entity Recognition focuses on the person, location and organization names. However, there is still a lack of data to identify travel-related named entities, especially in Mongolian. In this paper, we introduce a newly corpus for Mongolian Tourism Named Entity Recognition (MTNER), consisting of 16,000 sentences annotated with 18 entity types. We trained in-domain BERT representations with the 10 GB of unannotated Mongolian corpus, and trained a NER model based on the BERT tagging model with the newly corpus. Which achieves an overall 82.09 F1 score on Mongolian Tourism Named Entity Recognition and lead to an absolute increase of +3.54 F1 score over the traditional CRF Named Entity Recognition method.

  • Video Transcripts
  • 10.48448/7zz2-bt89
A Unified Generative Framework for Various NER Subtasks
  • Aug 1, 2021
  • Underline Science Inc.
  • Hang Yan

Named Entity Recognition (NER) is the task of identifying spans that represent entities in sentences. Whether the entity spans are nested or discontinuous, the NER task can be categorized into the flat NER, nested NER, and discontinuous NER subtasks. These subtasks have been mainly solved by the token-level sequence labelling or span-level classification. However, these solutions can hardly tackle the three kinds of NER subtasks concurrently. To that end, we propose to formulate the NER subtasks as an entity span sequence generation task, which can be solved by a unified sequence-to-sequence (Seq2Seq) framework. Based on our unified framework, we can leverage the pre-trained Seq2Seq model to solve all three kinds of NER subtasks without the special design of the tagging schema or ways to enumerate spans. We exploit three types of entity representations to linearize entities into a sequence. Our proposed framework is easy-to-implement and achieves state-of-the-art (SoTA) or near SoTA performance on eight English NER datasets, including two flat NER datasets, three nested NER datasets, and three discontinuous NER datasets.

  • Research Article
  • Cite Count Icon 1
  • 10.3897/biss.8.140428
BiodiViz: Leveraging NER and RE for Automated Knowledge Graph Generation in Biodiversity Research
  • Oct 29, 2024
  • Biodiversity Information Science and Standards
  • Angela Shannen Tan + 2 more

In biodiversity research, the integration of machine learning and data visualization is increasingly important for uncovering valuable insights from academic literature. This study introduces an innovative knowledge graph application, BiodiViz, designed to translate intricate text into intuitive visual representations, fostering a deeper comprehension of biodiversity relationships. BiodiViz uses the top-performing Named Entity Recognition (NER) and Relation Extraction (RE) models to automatically generate a comprehensive knowledge graph for biodiversity research. The NER model extracts and categorizes entities like organisms, phenomena, and habitats, while the RE model identifies relationships such as "have," "occur in," and "influence" from the BiodivNERE dataset (Abdelmageed et al. 2022). These entities and relationships are organized into nodes and edges within a graph. Researchers input text into BiodiViz, producing a visual knowledge graph that simplifies the analysis of complex biodiversity data, reducing manual effort and enhancing efficiency. Named Entity Recognition & Relation Extraction BiodiViz leverages advanced Bidirectional Encoder Representations from Transformers (BERT)-based Large Language Models (LLMs) (Rogers et al. 2020), fine-tuned specifically for NER and RE tasks using the BiodivNERE dataset. The fine-tuning process involved various models, including BERT (Devlin et al. 2019), ELECTRA (Clark et al. 2020), and BiodivBERT (Abdelmageed et al. 2023). These models were evaluated for performance using the results of their F1-score as the main metric, which is the harmonic mean of precision (the proportion of true positive results among all positive predictions) and recall (the proportion of true positive results among all actual positives), with BiodivBERT achieving an F1-score of 77.16% for the NER task, while BERT excelled in the RE task with an F1-score of 81.28%. Rigorous hyperparameter optimization further enhanced the performance of BiodivBERT in the RE task by 3.38%. The BiodivNERE corpora by Abdelmageed et al. (2022) were used to fine-tune several models for NER and RE tasks in the biodiversity domain. The first corpus from the BiodivNERE corpora is BiodivNER, which is a gold standard dataset (manually labelled test corpora) for evaluating NER tasks. The fine-tuning process employed the token classification method from the Hugging Face library (Hugging Face 2023b), which assigns labels to each token in a sequence. Experiments were conducted with a batch size of four, meaning the model processes four examples/rows of data at a time before making an update to improve its learning. This is due to the constraints of the NVIDIA® GeForce RTX™ 3060 graphics processor. (NVIDIA 2024) Model performance was evaluated using the seqeval library (Nakayama 2018), focusing on accuracy, precision, recall, and F1 scores. For text classification, the second corpus, BiodivRE, was utilized, following previous research recommendations to explore fine-tuning settings for BiodivBERT. Hyperparameter optimization (Feurer and Hutter 2019) was conducted using Hugging Face’s Trainer API with an Optuna backend (Hugging Face 2023a), concentrating on learning rate and the number of training epochs (i.e., the number of complete passes through the entire dataset during model training). The BiodiViz Knowledge Graph Application The fine-tuned NER and RE models with the best F1-scores—BiodivBERT and BERT, respectively—were integrated into the knowledge graph application. Fig. 1 illustrates the flowchart of the application pipeline. Each sentence in the input text will go through the NER model to identify and label the entities within the sentence. Subsequently, these labeled entities, together with the original sentence, will be input into the RE model. The RE model will analyze every pair of entities for a potential relation and output the type of relation they share. The application will then utilize this data to create a graph with appropriate labels and color-coding. An example of the application's user interface with the knowledge graph is shown in Fig. 2. This study highlights the practical application of machine learning and data visualization in advancing biodiversity research, emphasizing the importance of developing user-friendly tools to support scientific exploration and discovery. The BiodiViz application, including the code and resources, is available on GitHub*1, providing an accessible tool for biodiversity researchers to streamline their analyses.

  • Conference Article
  • Cite Count Icon 16
  • 10.18653/v1/2021.eacl-main.318
GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition
  • Jan 1, 2021
  • Xinyan Zhao + 2 more

Instead of using expensive manual annotations, researchers have proposed to train named entity recognition (NER) systems using heuristic labeling rules. However, devising labeling rules is challenging because it often requires a considerable amount of manual effort and domain expertise. To alleviate this problem, we propose GLARA, a graph-based labeling rule augmentation framework, to learn new labeling rules from unlabeled data. We first create a graph with nodes representing candidate rules extracted from unlabeled data. Then, we design a new graph neural network to augment labeling rules by exploring the semantic relations between rules. We finally apply the augmented rules on unlabeled data to generate weak labels and train a NER model using the weakly labeled data. We evaluate our method on three NER datasets and find that we can achieve an average improvement of +20% F1 score over the best baseline when given a small set of seed rules.

  • Research Article
  • Cite Count Icon 1
  • 10.3390/app14051717
PNER: Applying the Pipeline Method to Resolve Nested Issues in Named Entity Recognition
  • Feb 20, 2024
  • Applied Sciences
  • Hongjian Yang + 2 more

Named entity recognition (NER) in natural language processing encompasses three primary types: flat, nested, and discontinuous. While the flat type often garners attention from researchers, nested NER poses a significant challenge. Current approaches to addressing nested NER involve sequence labeling methods with merged label layers, cascaded models, and those rooted in reading comprehension. Among these, sequence labeling with merged label layers stands out for its simplicity and ease of implementation. Yet, highlighted issues persist within this method, prompting our aim to enhance its efficacy. In this study, we propose augmentations to the sequence labeling approach by employing a pipeline model bifurcated into sequence labeling and text classification tasks. Departing from annotating specific entity categories, we amalgamated types into main and sub-categories for a unified treatment. These categories were subsequently embedded as identifiers in the recognition text for the text categorization task. Our choice of resolution involved BERT+BiLSTM+CRF for sequence labeling and the BERT model for text classification. Experiments were conducted across three nested NER datasets: GENIA, CMeEE, and GermEval 2014, featuring annotations varying from four to two levels. Before model training, we conducted separate statistical analyses on nested entities within the medical dataset CMeEE and the everyday life dataset GermEval 2014. Our research unveiled a consistent dominance of a particular entity category within nested entities across both datasets. This observation suggests the potential utility of labeling primary and subsidiary entities for effective category recognition. Model performance was evaluated based on F1 scores, considering correct recognition only when both the complete entity name and category were identified. Results showcased substantial performance enhancement after our proposed modifications compared to the original method. Additionally, our improved model exhibited strong competitiveness against existing models. F1 scores on the GENIA, CMeEE, and GermEval 2014 datasets reached 79.21, 66.71, and 87.81, respectively. Our research highlights that, while preserving the original method’s simplicity and implementation ease, our enhanced model achieves heightened performance and competitive prowess compared to other methodologies.

  • Research Article
  • 10.1016/j.compbiomed.2025.111013
A comprehensive evaluation of large language models for information extraction from unstructured electronic health records in residential aged care.
  • Oct 1, 2025
  • Computers in biology and medicine
  • Dinithi Vithanage + 5 more

Despite rapid healthcare digitization, extracting information from unstructured electronic health records (EHRs), such as nursing notes, remains challenging due to inconsistencies and ambiguities in clinical documentation. Generative large language models (LLMs) have emerged as promising tools for automating information extraction (IE); however, their application in real-world clinical settings, such as residential aged care (RAC), is limited by critical gaps. Prior studies have often focused on structured EHR data and conventional evaluation metrics such as accuracy and F1 score, overlooking critical aspects like robustness, fairness, bias, and contextual relevance, particularly in unstructured clinical narratives. To address these gaps, this study develops a holistic evaluation framework for clinical IE from free-text nursing notes in the Australian RAC. We systematically evaluate 17 LLMs, including general-purpose and healthcare-specific variants (e.g., LLaMA, Mistral, Gemini, T5) across retrieval-augmented generation (RAG) frameworks and few-shot learning configurations (one-shot, three-shot, four-shot, five-shot). The evaluation focuses on two clinical IE tasks: named entity recognition (NER) and summarization. Results reveal LLaMA 3.1 achieved 88.58% accuracy, 87.43% F1 score in NER, 88.18% F1 score, and 83.15% relevance in summarization. However, robustness remained low (4.00% for NER, 4.31% for summarization) despite excellent fairness (99.9%) and minimal bias (0.11%) in both tasks. Further, healthcare-specific LLMs slightly outperform general models, and RAG-based approaches (LangChain, LlamaIndex) yield superior results. Task-specific optimal few-shot settings emerged: three-shot for NER and five-shot for summarization. This study provides a foundation for safely integrating generative AI into clinical decision support.

Save Icon
Up Arrow
Open/Close