Abstract

Medical text categorization is a specific area of text categorization. Classification for medical texts is considered a special case of text classification. Medical text includes medical records and medical literature, both of which are important clinical information resources. However, medical text contains complex medical vocabularies, medical measures, which has problems with high-dimensionality and data sparsity, so text classification in the medical domain is more challenging than those in other general domains. In order to solve these problems, this paper proposes a unified neural network method. In the sentence representation, the convolutional layer extracts features from the sentence and a bidirectional gated recurrent unit (BIGRU) is used to access both the preceding and succeeding sentence features. An attention mechanism is employed to obtain the sentence representation with the important word weights. In the document representation, the method uses the BIGRU to encode the sentences, which is obtained in sentence representation and then decode it through the attention mechanism to get the document representation with important sentence weights. Finally, a category of medical text is obtained through a classifier. Experimental verifications are conducted on four medical text datasets, including two medical record datasets and two medical literature datasets. The results clearly show that our method is effective.

Highlights

  • As a classic task of natural language processing, text classification can quickly find the corresponding categories from massive data and realize automatic classification [1]

  • The method uses the bidirectional gated recurrent unit (BIGRU) to encode the sentences which is obtained in sentence representation and decode it through the attention mechanism to get the document representation with important sentence weights

  • In addition to convolutional neural networks, recurrent neural networks are widely used in text classification because of their natural sequence structure, which is suitable for natural language processing

Read more

Summary

Introduction

As a classic task of natural language processing, text classification can quickly find the corresponding categories from massive data and realize automatic classification [1]. Medical text contains medical records and medical literature [2] The former is a record of the medical activity process of the doctor’s examination, diagnosis and treatment and development of the patient’s disease. It is the detailed information of the patient during treatment The latter is a documentary record of the research results of the latest medical methods. In order to solve the problem of high-dimensionality of medical texts, we propose a new hierarchical neural network method. 3. The experimental results show that the proposed method is effective in medical records and medical literature, especially in medical records.

Gated Recurrent Unit
Text Classification in General Domain
Text Classification in Medical Domain
Our Work
Word Embedding
One Dimension Convolutional Layer
Bidirectional GRU and Attention Mechanism
Sentence Encoder
Sentence Decoder
Experimental Setup
Datasets
Parameter Settings
Baseline Methods
Overall Comparison
AIM
Effect of Parameter Tuning
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call