Text classification is a central part of natural language processing, with important applications in understanding the knowledge behind biomedical texts including electronic health records (EHR). In this article, we propose a novel heterogeneous graph convolutional network method for classifying EHR texts. Our method, called EHR-HGCN, is able to combine context-sensitive word and sentence embeddings with structural sentence-level and word-level relation information to perform text classification. EHR-HGCN reframes EHR text classification as a graph classification task to better capture structural information about the document using a heterogeneous graph. To mine contextual information from a document, EHR-HGCN first applies a bidirectional recurrent neural network (BiRNN) on word embeddings obtained via Global Vectors for word representation (GloVe) to obtain context-sensitive word-level and sentence-level embeddings. To mine structural relationships from the document, EHR-HGCN then constructs a heterogeneous graph over the word and sentence embeddings, where sentence-word and word-word relationships are represented by graph edges. Finally, a heterogeneous graph convolutional neural network is used to classify documents by their graph representation. We evaluate EHR-HGCN on a variety of standard text classification benchmarks and find that EHR-HGCN has higher accuracy and F1-score than other representative machine learning and deep learning methods. We also apply EHR-HGCN to the MedLit benchmark and find it performs with high accuracy and F1-score on the task of section classification in EHR texts. Our ablation experiments show that the heterogeneous graph construction and heterogeneous graph convolutional network are critical to the performance of EHR-HGCN.
Read full abstract