Abstract

Text classification is a central part of natural language processing, with important applications in understanding the knowledge behind biomedical texts including electronic health records (EHR). In this article, we propose a novel heterogeneous graph convolutional network method for classifying EHR texts. Our method, called EHR-HGCN, is able to combine context-sensitive word and sentence embeddings with structural sentence-level and word-level relation information to perform text classification. EHR-HGCN reframes EHR text classification as a graph classification task to better capture structural information about the document using a heterogeneous graph. To mine contextual information from a document, EHR-HGCN first applies a bidirectional recurrent neural network (BiRNN) on word embeddings obtained via Global Vectors for word representation (GloVe) to obtain context-sensitive word-level and sentence-level embeddings. To mine structural relationships from the document, EHR-HGCN then constructs a heterogeneous graph over the word and sentence embeddings, where sentence-word and word-word relationships are represented by graph edges. Finally, a heterogeneous graph convolutional neural network is used to classify documents by their graph representation. We evaluate EHR-HGCN on a variety of standard text classification benchmarks and find that EHR-HGCN has higher accuracy and F1-score than other representative machine learning and deep learning methods. We also apply EHR-HGCN to the MedLit benchmark and find it performs with high accuracy and F1-score on the task of section classification in EHR texts. Our ablation experiments show that the heterogeneous graph construction and heterogeneous graph convolutional network are critical to the performance of EHR-HGCN.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.