Abstract

BackgroundSemantic annotators and Natural Language Processing (NLP) methods for Named Entity Recognition and Linking (NER+L) require plenty of training and test data, especially in the biomedical domain. Despite the abundance of unstructured biomedical data, the lack of richly annotated biomedical datasets poses hindrances to the further development of NER+L algorithms for any effective secondary use. In addition, manual annotation of biomedical documents performed by physicians and experts is a costly and time-consuming task. To support, organize and speed up the annotation process, we introduce MedTAG, a collaborative biomedical annotation tool that is open-source, platform-independent, and free to use/distribute.ResultsWe present the main features of MedTAG and how it has been employed in the histopathology domain by physicians and experts to annotate more than seven thousand clinical reports manually. We compare MedTAG with a set of well-established biomedical annotation tools, including BioQRator, ezTag, MyMiner, and tagtog, comparing their pros and cons with those of MedTag. We highlight that MedTAG is one of the very few open-source tools provided with an open license and a straightforward installation procedure supporting cross-platform use.ConclusionsMedTAG has been designed according to five requirements (i.e. available, distributable, installable, workable and schematic) defined in a recent extensive review of manual annotation tools. Moreover, MedTAG satisfies 20 over 22 criteria specified in the same study.

Highlights

  • Semantic annotators and Natural Language Processing (NLP) methods for Named Entity Recognition and Linking (NER+L) require plenty of training and test data, especially in the biomedical domain

  • ExaTAG​10 is an instance of MedTAG tailored for the histopathology domain

  • We described an instance of MedTAG adopted in the histopathology domain, where MedTAG has been used by physicians to annotate more than seven thousand clinical reports in three languages (Dutch, English and Italian), from two health-care institutions

Read more

Summary

Introduction

Semantic annotators and Natural Language Processing (NLP) methods for Named Entity Recognition and Linking (NER+L) require plenty of training and test data, especially in the biomedical domain. Narrative clinical reports are usually conceived as free-text reports, which are human-readable but not machine-readable This brings interoperability issues and limitations to effective secondary reuse of data, essential for medical decision making and support. Records (EHRs), Information Extraction (IE) algorithms and NLP techniques have been developed and are currently exploited To this aim, significant efforts have been dedicated to applying Named Entity Recognition and Linking (NER+L) methods for entity extraction and semantic annotation [2,3,4,5,6]. Semantic annotation is the NLP task of identifying the type of an entity and uniquely linking it to a corresponding knowledge base entry [7]; it leverages both text-processing and Machine Learning (ML) techniques to tackle biomedical information extraction challenges such as terms and abbreviations disambiguation. Even in an unsupervised context, Giachelle et al BMC Medical Informatics and Decision Making (2021) 21:352

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call