HunFlair: an easy-to-use tool for state-of-the-art biomedical named entity recognition.

Leon Weber,Jannes Münchmeyer,Mario Sänger,Alan Akbik,Ulf Leser,Maryam Habibi

doi:10.1093/bioinformatics/btab042

Abstract

SummaryNamed entity recognition (NER) is an important step in biomedical information extraction pipelines. Tools for NER should be easy to use, cover multiple entity types, be highly accurate and be robust toward variations in text genre and style. We present HunFlair, a NER tagger fulfilling these requirements. HunFlair is integrated into the widely used NLP framework Flair, recognizes five biomedical entity types, reaches or overcomes state-of-the-art performance on a wide set of evaluation corpora, and is trained in a cross-corpus setting to avoid corpus-specific bias. Technically, it uses a character-level language model pretrained on roughly 24 million biomedical abstracts and three million full texts. It outperforms other off-the-shelf biomedical NER tools with an average gain of 7.26 pp over the next best tool in a cross-corpus setting and achieves on-par results with state-of-the-art research prototypes in in-corpus experiments. HunFlair can be installed with a single command and is applied with only four lines of code. Furthermore, it is accompanied by harmonized versions of 23 biomedical NER corpora.Availability and implementationHunFlair ist freely available through the Flair NLP framework (https://github.com/flairNLP/flair) under an MIT license and is compatible with all major operating systems.Supplementary informationSupplementary data are available at Bioinformatics online.

Highlights

Recognizing biomedical entities (NER) such as genes, chemicals or diseases in unstructured scientific text is a crucial step of all biomedical information extraction pipelines
HUNER does not build upon a pretrained language model (LM), such models were the basis for many recent breakthroughs in NLP research (Akbik et al, 2019)
We compare the tagging accuracy of HunFlair to two types of competitors: Other ‘off-the-shelf’ biomedical Named entity recognition (NER) tools, and other recent research prototypes

Summary

Introduction

Recognizing biomedical entities (NER) such as genes, chemicals or diseases in unstructured scientific text is a crucial step of all biomedical information extraction pipelines. In any real application they are applied ‘in the wild’, i.e. to a large collection of texts often varying in focus, entity distribution, genre (e.g. patents versus scientific articles) and text type (e.g. abstract versus full text) This mismatch can lead to severely misleading evaluation results. HunFlair builds upon a pretrained character-level language model It recognizes five important biomedical entity types with high accuracy, namely Cell Lines, Chemicals, Diseases, Genes and Species. We integrate 23 biomedical NER corpora into HunFlair using a consistent format, which enables researchers and practitioners to rapidly train their own models and experiment with new approaches within Flair. Note that these are the same corpora that were already made available through HUNER. While HUNER’s corpora came preprocessed with a particular method, users of HunFlair may process the corpora along with their own choices, for instance by using different sentence resp. word segmentation methods

Hunflair

Results

Comparison to off-the-shelf tools

Comparison to research prototypes

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics	Publication Date: Jan 28, 2021
Citations: 64	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

HunFlair: an easy-to-use tool for state-of-the-art biomedical named entity recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition
Usman Naseem ... Matloob Khushi
-
Usman Naseem, et. al.Usman Naseem ... Matloob Khushi
18 Jul 2021
18 Jul 2021

Dataset-aware multi-task learning approaches for biomedical named entity recognition.
Mei Zuo ... Yang Zhang
Bioinformatics | VOL. 36
Mei Zuo, et. al.Mei Zuo ... Yang Zhang
16 May 2020
Bioinformatics | VOL. 36

Cross-type biomedical named entity recognition with deep multi-task learning.
Xuan Wang ... Curtis Langlotz
Bioinformatics | VOL. 35
Xuan Wang, et. al.Xuan Wang ... Curtis Langlotz
11 Oct 2018
Bioinformatics | VOL. 35

Improving deep learning method for biomedical named entity recognition by using entity definition information
Ying Xiong ... Yi Zhou
BMC Bioinformatics | VOL. 22
Ying Xiong, et. al.Ying Xiong ... Yi Zhou
01 Dec 2021
BMC Bioinformatics | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HunFlair: an easy-to-use tool for state-of-the-art biomedical named entity recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics