Entities In Text Research Articles

With the exponential growth of the life sciences literature, biomedical text mining (BTM) has become an essential technology for accelerating the extraction of insights from publications. The identification of entities in texts, such as diseases or genes, and their normalization, i.e. grounding them in knowledge base, are crucial steps in any BTM pipeline to enable information aggregation from multiple documents. However, tools for these two steps are rarely applied in the same context in which they were developed. Instead, they are applied "in the wild," i.e. on application-dependent text collections from moderately to extremely different from those used for training, varying, e.g. in focus, genre or text type. This raises the question whether the reported performance, usually obtained by training and evaluating on different partitions of the same corpus, can be trusted for downstream applications. Here, we report on the results of a carefully designed cross-corpus benchmark for entity recognition and normalization, where tools were applied systematically to corpora not used during their training. Based on a survey of 28 published systems, we selected five, based on predefined criteria like feature richness and availability, for an in-depth analysis on three publicly available corpora covering four entity types. Our results present a mixed picture and show that cross-corpus performance is significantly lower than the in-corpus performance. HunFlair2, the redesigned and extended successor of the HunFlair tool, showed the best performance on average, being closely followed by PubTator Central. Our results indicate that users of BTM tools should expect a lower performance than the original published one when applying tools in "the wild" and show that further research is necessary for more robust BTM tools. All our models are integrated into the Natural Language Processing (NLP) framework flair: https://github.com/flairNLP/flair. Code to reproduce our results is available at: https://github.com/hu-ner/hunflair2-experiments.

Read full abstract

The BioRED track at BioCreative VIII calls for a community effort to identify, semantically categorize, and highlight the novelty factor of the relationships between biomedical entities in unstructured text. Relation extraction is crucial for many biomedical natural language processing (NLP) applications, from drug discovery to custom medical solutions. The BioRED track simulates a real-world application of biomedical relationship extraction, and as such, considers multiple biomedical entity types, normalized to their specific corresponding database identifiers, as well as defines relationships between them in the documents. The challenge consisted of two subtasks: (i) in Subtask 1, participants were given the article text and human expert annotated entities, and were asked to extract the relation pairs, identify their semantic type and the novelty factor, and (ii) in Subtask 2, participants were given only the article text, and were asked to build an end-to-end system that could identify and categorize the relationships and their novelty. We received a total of 94 submissions from 14 teams worldwide. The highest F-score performances achieved for the Subtask 1 were: 77.17% for relation pair identification, 58.95% for relation type identification, 59.22% for novelty identification, and 44.55% when evaluating all of the above aspects of the comprehensive relation extraction. The highest F-score performances achieved for the Subtask 2 were: 55.84% for relation pair, 43.03% for relation type, 42.74% for novelty, and 32.75% for comprehensive relation extraction. The entire BioRED track dataset and other challenge materials are available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC8-BioRED-track/ and https://codalab.lisn.upsaclay.fr/competitions/13377 and https://codalab.lisn.upsaclay.fr/competitions/13378. Database URL: https://ftp.ncbi.nlm.nih.gov/pub/lu/BC8-BioRED-track/https://codalab.lisn.upsaclay.fr/competitions/13377https://codalab.lisn.upsaclay.fr/competitions/13378.

Read full abstract

Entities In Text Research Articles

Related Topics

Articles published on Entities In Text

[formula omitted]: A multimodal misinformation dataset for media authenticity analysis

An ontology-based text mining dataset for extraction of process-structure-property entities

Power text information extraction based on multi-task learning

HunFlair2 in a cross-corpus evaluation of biomedical named entity recognition and normalization tools.

SWENER-1800

An implicit aspect-based sentiment analysis method using supervised contrastive learning and knowledge embedding

Integrating graph convolutional networks to enhance prompt learning for biomedical relation extraction

Hierarchical visual semantic guidance for enhanced relationship recognition in domain knowledge graphs

Analysis of Fault Events in Rail Transit Vehicle Traction Systems Based on Knowledge Graph Reasoning

The overview of the BioRED (Biomedical Relation Extraction Dataset) track at BioCreative VIII.

Aspect Based Sentiment Analysis using Deep Learning Algorithm: A Review

Named Entity Recognition for Chinese Texts on Marine Coral Reef Ecosystems Based on the BERT-BiGRU-Att-CRF Model

Enhancing text-based knowledge graph completion with zero-shot large language models: A focus on semantic enhancement

HAGCN: A relation extraction model based on heterogeneous graph convolutional neural network and graph attention

Building a challenging medical dataset for comparative evaluation of classifier capabilities

Improving biomedical Named Entity Recognition with additional external contexts

A joint extraction method for fault text entity relationships in smart grid considering nested entities and complex semantics

Improving few-shot named entity recognition via Semantics induced Optimal Transport

Chinese satellite frequency and orbit entity relation extraction method based on dynamic integrated learning

Language model based on deep learning network for biomedical named entity recognition

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Entities In Text Research Articles

Related Topics

Articles published on Entities In Text

[formula omitted]: A multimodal misinformation dataset for media authenticity analysis

An ontology-based text mining dataset for extraction of process-structure-property entities

Power text information extraction based on multi-task learning

HunFlair2 in a cross-corpus evaluation of biomedical named entity recognition and normalization tools.

SWENER-1800

An implicit aspect-based sentiment analysis method using supervised contrastive learning and knowledge embedding

Integrating graph convolutional networks to enhance prompt learning for biomedical relation extraction

Hierarchical visual semantic guidance for enhanced relationship recognition in domain knowledge graphs

Analysis of Fault Events in Rail Transit Vehicle Traction Systems Based on Knowledge Graph Reasoning

The overview of the BioRED (Biomedical Relation Extraction Dataset) track at BioCreative VIII.

Aspect Based Sentiment Analysis using Deep Learning Algorithm: A Review

Named Entity Recognition for Chinese Texts on Marine Coral Reef Ecosystems Based on the BERT-BiGRU-Att-CRF Model

Enhancing text-based knowledge graph completion with zero-shot large language models: A focus on semantic enhancement

HAGCN: A relation extraction model based on heterogeneous graph convolutional neural network and graph attention

Building a challenging medical dataset for comparative evaluation of classifier capabilities

Improving biomedical Named Entity Recognition with additional external contexts

A joint extraction method for fault text entity relationships in smart grid considering nested entities and complex semantics

Improving few-shot named entity recognition via Semantics induced Optimal Transport

Chinese satellite frequency and orbit entity relation extraction method based on dynamic integrated learning

Language model based on deep learning network for biomedical named entity recognition