A hybrid medical text classification framework: Integrating attentive rule construction and neural network

Xiang Li,Menglin Cui,Jingpeng Li,Ruibin Bai,Zheng Lu,Uwe Aickelin

doi:10.1016/j.neucom.2021.02.069

Abstract

The main objective of this work is to improve the quality and transparency of the medical text classification solutions. Conventional text classification methods provide users with only a restricted mechanism (based on frequency) for selecting features. In this paper, a three-stage hybrid method combining the gated attention-based bi-directional Long Short-Term Memory (ABLSTM) and the regular expression based classifier is proposed for medical text classification tasks. The bi-directional Long Short-Term Memory (LSTM) architecture with an attention layer allows the network to weigh words according to their perceived importance and focus on crucial parts of a sentence. Feature words (or keywords) extracted by ABLSTM model are utilized to guide the regular expression rule construction. Our proposed approach leverages the advantages of both the interpretability of rule-based algorithms and the computational power of deep learning approaches for a production-ready scenario. Experimental results on real-world medical online query data clearly validate the superiority of our system in selecting domain-specific and topic-related features. Results show that the proposed approach achieves an accuracy of 0.89 and an F1-score of 0.92 respectively. Furthermore, our experimentation also illustrates the versatility of regular expressions as a user-level tool for focusing on desired patterns and providing interpretable solutions for human modification.

Highlights

Text classification is a well-established field related to Natural Language Processing (NLP)
We find that the attentive bi-directional Long Short-Term Memory (ABLSTM) is able to identify keywords in a sentence relevant to its meaning and guide the regular expression construction process
In real-world applications, such a system is more suitable for the production-ready scenario than existing text classification models based solely on deep learning approaches

Summary

Introduction

Text classification is a well-established field related to Natural Language Processing (NLP). Accurate and precise decision making is often required. NLP for medical text is usually challenging because a great amount of domain knowledge is required to solve an even seemingly simple problem [1]. We are motivated by a real-world problem concerning online medical queries, which, as our main processing contexts, are in narrative formats that preserve the nature of ambiguity and informality. Given the medical category “female hypogastralgia”, queries received from users include “My underbelly aches during every menstrual period.”, “I had a stomachache and menstruation didnt come on time.”, “Feeling lower abdomen swells and backache after menstruation.”, etc. Forms of expression vary from one person to another, which makes it more difficult to discover the underlying patterns than those in ordinary written texts. Most current approaches rely on deep neural networks to perform text classi-

Objectives

Methods

Results

Conclusion