Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach

Menglin Cui,Xiang Li,Peiming Ge,Zheng Lu,Ruibin Bai,Uwe Aickelin

doi:10.1109/access.2019.2946622

Menglin Cui, Xiang Li + Show 4 more

Open Access

https://doi.org/10.1109/access.2019.2946622

Copy DOI

Abstract

Medical text classification assigns medical related text into different categories such as topics or disease types. Machine learning based techniques have been widely used to perform such tasks despite the obvious drawback in such “black box” approach, leaving no easy way to fine-tune the resultant model for better performance. We propose a novel constructive heuristic approach to generate a set of regular expressions that can be used as effective text classifiers. The main innovation of our approach is that we develop a novel regular expression based text classifier with both satisfactory classification performance and excellent interpretability. We evaluate our framework on real-world medical data provided by our collaborator, one of the largest online healthcare providers in the market, and observe the high performance and consistency of this approach. Experimental results show that the machine-generated regular expressions can be effectively used in conjunction with machine learning techniques to perform medical text classification tasks. The proposed methodology improves the performance of baseline methods (Naive Bayes and Support Vector Machines) by 9% in precision and 4.5% in recall. We also evaluate the performance of modified regular expressions by human experts and demonstrate the potential of practical applications using the proposed method.

Highlights

Despite the popularity of Electronic Medical Record System, there are still a large amount of unstructured text data in medical domain
The regex-based classifier narrows the gap between macro and micro F0.5 given by Naive Bayes (NB) and Support Vector Machines (SVM) models, indicating that the regular expressions elevate the performance of the classes with fewer samples, with which machine learning models do not perform well in general
Regular expressions have long been used for text processing because of their expressiveness and flexibility

Summary

INTRODUCTION

Despite the popularity of Electronic Medical Record System, there are still a large amount of unstructured text data in medical domain. The oral expression of medical terms is difficult to be processed by natural language processing (NLP) tools developed for ordinary text [7] To address these issues, we investigate an automated regular expression generation method to classify medical texts in order to provide informative and comprehensive human-like medical guidance. Medical text classification approaches should aim to achieve better performance (in terms of precision and recall, for example) and at the same time allow human experts to modify the solutions for even better results. Our regular expression based system is transparent and interpretable for domain experts to make further modifications, whereas a system that is using sophisticated and not easy-to-understand machine learning techniques may require additional efforts to achieve this goal.

RELATED WORK

PROBLEM DESCRIPTION

REGULAR EXPRESSION DEFINITION

GENERALITY OF A REGULAR EXPRESSION

REGULAR EXPRESSIONS GENERATION

CONSTRUCTIVE HEURISTIC METHOD

EXPERIMENTS

Findings

CONCLUSIONS AND EXTENSIONS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 28	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Applicability of Machine Learning Methods to Multi-label Medical Text Classification
Iuliia Lenivtceva ... Georgy Kopanitsa
-
Iuliia Lenivtceva, et. al.Iuliia Lenivtceva ... Georgy Kopanitsa
01 Jan 2020
01 Jan 2020

A hybrid medical text classification framework: Integrating attentive rule construction and neural network
Xiang Li ... Uwe Aickelin
Neurocomputing | VOL. 443
Xiang Li, et. al.Xiang Li ... Uwe Aickelin
10 Mar 2021
Neurocomputing | VOL. 443

Medical Text Classification Using Hybrid Deep Learning Models with Multihead Attention.
Sunil Kumar Prabhakar ... Dong-Ok Won
Computational Intelligence and Neuroscience | VOL. 2021
Sunil Kumar Prabhakar, et. al.Sunil Kumar Prabhakar ... Dong-Ok Won
01 Jan 2020
Computational Intelligence and Neuroscience | VOL. 2021

Learning Regular Expressions for Interpretable Medical Text Classification Using a Pool-based Simulated Annealing Approach
Chaofan Tu ... Menglin Cui
-
Chaofan Tu, et. al.Chaofan Tu ... Menglin Cui
01 Jul 2020
01 Jul 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access