Abstract

In this article, we demonstrate the impact of interactive machine learning: we develop biomedical entity recognition dataset using a human-into-the-loop approach. In contrary to classical machine learning, human-in-the-loop approaches do not operate on predefined training or test sets, but assume that human input regarding system improvement is supplied iteratively. Here, during annotation, a machine learning model is built on previous annotations and used to propose labels for subsequent annotation. To demonstrate that such interactive and iterative annotation speeds up the development of quality dataset annotation, we conduct three experiments. In the first experiment, we carry out an iterative annotation experimental simulation and show that only a handful of medical abstracts need to be annotated to produce suggestions that increase annotation speed. In the second experiment, clinical doctors have conducted a case study in annotating medical terms documents relevant for their research. The third experiment explores the annotation of semantic relations with relation instance learning across documents. The experiments validate our method qualitatively and quantitatively, and give rise to a more personalized, responsive information extraction technology.

Highlights

  • Introduction and motivationThe biomedical domain is increasingly turning into a dataintensive science, and one challenge with regard to the ever-increasing body of medical literature is to extract meaningful information from this data, but to gain knowledge, insight, and to make sense of the data [1]

  • We investigated the impact of adaptive machine learning for the annotation of quality training data

  • Identifying the need of entity tagging for applications such as information extraction (IE), document summarization, fact exploring and relation extraction, and identifying the annotation acquisition bottleneck which is especially severe in the medical domain, we have carried out three experiments that show the utility of a human-in-the-loop approach for suggesting annotations in order to speed up the process and to widen this bottleneck

Read more

Summary

Introduction and motivation

The biomedical domain is increasingly turning into a dataintensive science, and one challenge with regard to the ever-increasing body of medical literature is to extract meaningful information from this data, but to gain knowledge, insight, and to make sense of the data [1]. The human-into-the-loop automation approach enables users to start the automation process without pre-existing annotations, and works by suggesting annotations as soon as the users have annotated a rather small number of documents This annotate-little and predict-little strategy is deemed adequate for biomedical domains as it (1) produce quality annotation in a very short period of time, (2) the approach is adaptive in such a way that newly evolving concepts or entities will not be ignored by an old and static prediction classification model, and 3) the conceptualization (i.e. entity types and their typed relations) can be chosen and extended by the user during the annotation process. Part of this article was already presented in a shorter form in [15]

Human into the loop
Interactive and adaptive learning
NER for medical domains
Relation learning in the medical domain
Annotation learning
The WebAnno annotation tool
Medical NER tagging and relation extraction
Entity annotation
Entity automation and relation copy annotator
Simulating interactive learning
Entity automation
Relation copy annotator
Qualitative Assessment
Findings
Conclusion and future outlook
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call