Named Entity Recognition Research Articles

The rapid expansion of medical information has resulted in named entity recognition (NER) and relation extraction (RE) essential for clinical decision support systems. Medical texts often contain specialized vocabulary, ambiguous abbreviations, synonyms, polysemous terms, and overlapping entities, which introduce significant challenges to the extraction process. Existing approaches, which typically rely on single models such as BiLSTM or BERT, often struggle with these complexities. Although large language models (LLMs) have shown promise in various NLP tasks, they still face limitations in handling token-level tasks critical for medical NER and RE. To address these challenges, we propose COMCARE, a collaborative ensemble framework for context-aware medical NER and RE that integrates multiple pre-trained language models through a collaborative decision strategy. For NER, we combined PubMedBERT and PubMed-T5, leveraging PubMedBERT’s contextual understanding and PubMed-T5’s generative capabilities to handle diverse forms of medical terminology, from standard domain-specific jargon to nonstandard representations, such as uncommon abbreviations and out-of-vocabulary (OOV) terms. For RE, we integrated general-domain BERT with biomedical-specific BERT and PubMed-T5, utilizing token-level information from the NER module to enhance the context-aware entity-based relation extraction. To effectively handle long-range dependencies and maintain consistent performance across diverse texts, we implemented a semantic chunking approach and combined the model outputs through a majority voting mechanism. We evaluated COMCARE on several biomedical datasets, including BioRED, ADE, RDD, and DIANN Corpus. For BioRED, COMCARE achieved F1 scores of 93.76% for NER and 68.73% for RE, outperforming BioBERT by 1.25% and 1.74%, respectively. On the RDD Corpus, COMCARE showed F1 scores of 77.86% for NER and 86.79% for RE while achieving 82.48% for NER on ADE and 99.36% for NER on DIANN. These results demonstrate the effectiveness of our approach in handling complex medical terminology and overlapping entities, highlighting its potential to improve clinical decision support systems.

Read full abstract

Background Medical narratives are fundamental to the correct identification of a patient’s health condition. This is not only because it describes the patient’s situation. It also contains relevant information about the patient’s context and health state evolution. Narratives are usually vague and cannot be categorized easily. On the other hand, once the patient’s situation is correctly identified based on a narrative, it is then possible to map the patient’s situation into precise classification schemas and ontologies that are machine-readable. To this end, language models can be trained to read and extract elements from these narratives. However, the main problem is the lack of data for model identification and model training in languages other than English. First, gold standard annotations are usually not available due to the high level of data protection for patient data. Second, gold standard annotations (if available) are difficult to access. Alternative available data, like MIMIC (Sci Data 3:1, 2016) is written in English and for specific patient conditions like intensive care. Thus, when model training is required for other types of patients, like oncology (and not intensive care), this could lead to bias. To facilitate clinical narrative model training, a method for creating high-quality synthetic narratives is needed.MethodWe devised workflows based on generative AI methods to synthesize narratives in the German language to avoid the disclosure of patient’s health data. Since we required highly realistic narratives, we generated prompts, written with high-quality medical terminology, asking for clinical narratives containing both a main and co-disease. The frequency of distribution of both the main and co-disease was extracted from the hospital’s structured data, such that the synthetic narratives reflect the disease distribution among the patient’s cohort. In order to validate the quality of the synthetic narratives, we annotated them to train a Named Entity Recognition (NER) algorithm. According to our assumptions, the validation of this system implies that the synthesized data used for its training are of acceptable quality.ResultWe report precision, recall and F1 score for the NER model while also considering metrics that take into account both exact and partial entity matches. Trained models are cautious, with a precision up to 0.8 for Entity Type match metric and a F1 score of 0.3.ConclusionDespite its inherent limitations, this technology has the potential to allow data interoperability by using encoded diseases across languages and regions without compromising data safety. Additionally, it facilitates the synthesis of unstructured patient data. In this way, the identification and training of models can be accelerated. We believe that this method may be able to generate discharge letters for any combination of main and co-diseases, which will significantly reduce the amount of time spent writing these letters by healthcare professionals.

Read full abstract

Named Entity Recognition Research Articles

Related Topics

Articles published on Named Entity Recognition

COMCARE: A Collaborative Ensemble Framework for Context-Aware Medical Named Entity Recognition and Relation Extraction

Clinical entity-aware domain adaptation in low resource setting for inflammatory bowel disease

Chinese Mathematical Knowledge Entity Recognition Based on Linguistically Motivated Bidirectional Encoder Representation from Transformers

RAMIE: retrieval-augmented multi-task information extraction with large language models on dietary supplements.

Harnessing NLP to Investigate Biomarker Interactions and CVD Risks in Elderly Chronic Kidney Disease Patients.

An automated construction method of 3D knowledge graph based on multi-agent systems in virtual geographic scene

Integration of Natural Language Processing (NLP) in MongoDB NoSQL: A New Era of Efficient Text Data Management

Risk Assessment of Typhoon Disaster Chain Based on Knowledge Graph and Bayesian Network

Building knowledge graph for relevant degree recommendations using semantic similarity search and named entity recognition

A Comparative Study of Deep Learning Approaches for Arabic Language Processing

A joint entity and relation extraction framework for handling negative samples problems in named entity recognition

Enhanced Entity Recognition of Islamic Hadiths based-on Hybrid LSTM and AraBERT Model

Entity and relation extraction in the legal domain

How can language models assist with pharmaceuticals manufacturing deviations and investigations?

Named Entity Recognition: A Deep Dive

A large-scale Chinese patent dataset for information extraction

Tasaheel-v2: Development of Innovative Textual Analysis tool with Advanced Features

The Use of Large Language Models in Combination with the Ontological Approach for the Synthesis of Natural Language Text

Information extraction from green channel textual records on expressways using hybrid deep learning

The aluminum standard: using generative Artificial Intelligence tools to synthesize and annotate non-structured patient data

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Named Entity Recognition Research Articles

Related Topics

Articles published on Named Entity Recognition

COMCARE: A Collaborative Ensemble Framework for Context-Aware Medical Named Entity Recognition and Relation Extraction

Clinical entity-aware domain adaptation in low resource setting for inflammatory bowel disease

Chinese Mathematical Knowledge Entity Recognition Based on Linguistically Motivated Bidirectional Encoder Representation from Transformers

RAMIE: retrieval-augmented multi-task information extraction with large language models on dietary supplements.

Harnessing NLP to Investigate Biomarker Interactions and CVD Risks in Elderly Chronic Kidney Disease Patients.

An automated construction method of 3D knowledge graph based on multi-agent systems in virtual geographic scene

Integration of Natural Language Processing (NLP) in MongoDB NoSQL: A New Era of Efficient Text Data Management

Risk Assessment of Typhoon Disaster Chain Based on Knowledge Graph and Bayesian Network

Building knowledge graph for relevant degree recommendations using semantic similarity search and named entity recognition

A Comparative Study of Deep Learning Approaches for Arabic Language Processing

A joint entity and relation extraction framework for handling negative samples problems in named entity recognition

Enhanced Entity Recognition of Islamic Hadiths based-on Hybrid LSTM and AraBERT Model

Entity and relation extraction in the legal domain

How can language models assist with pharmaceuticals manufacturing deviations and investigations?

Named Entity Recognition: A Deep Dive

A large-scale Chinese patent dataset for information extraction

Tasaheel-v2: Development of Innovative Textual Analysis tool with Advanced Features

The Use of Large Language Models in Combination with the Ontological Approach for the Synthesis of Natural Language Text

Information extraction from green channel textual records on expressways using hybrid deep learning

The aluminum standard: using generative Artificial Intelligence tools to synthesize and annotate non-structured patient data