Abstract

The recognition and extraction of biomedical names is an essential task for the biomedical information extraction. However, the preparation of large annotated corpora hinders the training of the Named Entity Recognition (NER) systems. Active learning is reducing the needed manual annotation work in supervised learning task. In this work, we propose a novel clustering based active learning method for the biomedical NER task. We show that the underlying NER system using the proposed method outperforms those with other state of the art active learning methods, including density, Gibbs error and entropy based approaches, as well as the random selection. We compare variations of our proposed method and find the optimal design of the active learning method, which is to use the vector representation of named entities, and to select documents that are ‘representative’ and ‘informative’, as well as to use the Shared Nearest Neighbor (SNN) clustering approach. In particular, the optimal variant of the proposed method achieves a deficiency gain of 36.3% over the random selection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call