Extracting entities with attributes in clinical text via joint deep learning.

Xue Shi,Ying Xiong,Yaoyun Zhang,Zongcheng Ji,Buzhou Tang,Yingping Yi,Qingcai Chen,Hua Xu,Xiaolong Wang

doi:10.1093/jamia/ocz158

Abstract

Extracting clinical entities and their attributes is a fundamental task of natural language processing (NLP) in the medical domain. This task is typically recognized as 2 sequential subtasks in a pipeline, clinical entity or attribute recognition followed by entity-attribute relation extraction. One problem of pipeline methods is that errors from entity recognition are unavoidably passed to relation extraction. We propose a novel joint deep learning method to recognize clinical entities or attributes and extract entity-attribute relations simultaneously. The proposed method integrates 2 state-of-the-art methods for named entity recognition and relation extraction, namely bidirectional long short-term memory with conditional random field and bidirectional long short-term memory, into a unified framework. In this method, relation constraints between clinical entities and attributes and weights of the 2 subtasks are also considered simultaneously. We compare the method with other related methods (ie, pipeline methods and other joint deep learning methods) on an existing English corpus from SemEval-2015 and a newly developed Chinese corpus. Our proposed method achieves the best F1 of 74.46% on entity recognition and the best F1 of 50.21% on relation extraction on the English corpus, and 89.32% and 88.13% on the Chinese corpora, respectively, which outperform the other methods on both tasks. The joint deep learning-based method could improve both entity recognition and relation extraction from clinical text in both English and Chinese, indicating that the approach is promising.

Full Text