Abstract

Relation extraction is a necessary step in obtaining information from clinical medical records. In the medical domain, there have been several studies on relation extraction in modern medicine clinical notes written in English. However, very limited relation extraction research has been conducted on clinical notes written in Chinese, especially traditional Chinese medicine (TCM) clinical records (e.g., herb-symptom, herb-disease). Instead of independently extracting each relation from a single sentence or text, we propose to globally and reasonably extract multiple types of relations from the Chines clinical records with a novel heterogeneous graph representation learning method. Specifically, we first construct multiple view medical entity graphs based on the co-occurring relations, knowledge obtained from the clinic, and domain texts with the corresponding information of two medical entities from the Chinese clinical records, in which each edge is a candidate relation; we then build a Graph Convolutional Network (GCN)-based representation learning with the attention mechanism to simultaneously infer the existence of all the edges via classification. The experimental data were obtained from the Chinese medical records and literature provided by previous work. The main experimental results on Chinese clinical records show that our proposed model’s precision, recall, and F1-score reach 10.2%, 13.5%, 12.6%, demonstrating significant improvements over state-of-the-art.

Highlights

  • D ESPITE the rise of semi-structured and structured data, the text is still the most widespread content in the real word

  • We propose a new method to complete the task of relation extraction, to globally and reasonably extract relations from the entire corpus of Chinese clinical records from the perspective of graph representation learning

  • We propose a focused heterogeneous graph representation learning model to jointly learn the task of Relation extraction (RE)

Read more

Summary

Introduction

D ESPITE the rise of semi-structured and structured data, the text is still the most widespread content in the real word. To extract meaningful information and knowledge from free text is the subject of considerable research interest in the natural language processing (NLP) fields. With the tremendous growth in the adoption of electronic medical records (EMRs) contains a vast of medical information, such as the relation between treatment and disease, is becoming available as a treasure trove for large-scale health data analysis [1], [2]. Most of the information in current medical records is stored in natural language texts, which makes data mining algorithms unable to process these data directly. To extract the medical information from the EMRs, researchers generally use entity and relation extraction algorithm, which can be processed by conventional data mining algorithms directly. As a kind of EMR, Chinese electronic medical records contain much information about

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call