Learning for clinical named entity recognition without manual annotations

Omid Ghiasvand,Rohit J Kate

doi:10.1016/j.imu.2018.10.011

Omid Ghiasvand, Rohit J Kate

Open Access

https://doi.org/10.1016/j.imu.2018.10.011

Copy DOI

Abstract

Abstract Background Named entity recognition (NER) systems are commonly built using supervised methods that use machine learning to learn from corpora manually annotated with named entities. However, manually annotating corpora is very expensive and laborious. Materials and methods In this paper, a novel method is presented for training clinical NER systems that does not require any manual annotations. It only requires a raw text corpus and a resource like UMLS that can give a list of named entities along with their semantic types. Using these two resources, annotations are automatically obtained to train machine learning methods. The method was evaluated on the NER shared-task datasets of i2b2 2010 and SemEval 2014. Results On the SemEval 2014 dataset for recognizing diseases and disorders, the method obtained F-measure of 0.693 for exact matching and of 0.773 allowing overlaps. This is comparable to many supervised systems in the past that had used manual annotations for training. On the i2b2 2010 dataset for recognizing problems, tests and treatments, the method obtained F-measures of 0.451, 0.338 and 0.204 respectively for exact matching and of 0.721, 0.587 and 0.475 respectively allowing overlaps. These results are better than an existing unsupervised method. Conclusions Experiments on standard datasets showed that the new method performed well. The method is general and could be applied to recognize entities of other types on other genres of text without needing manual annotations.

Full Text