Learning regular expressions for clinical text classification

D D A Bui,Q Zeng-Treitler

doi:10.1136/amiajnl-2013-002411

Abstract

Natural language processing (NLP) applications typically use regular expressions that have been developed manually by human experts. Our goal is to automate both the creation and utilization of regular expressions in text classification. We designed a novel regular expression discovery (RED) algorithm and implemented two text classifiers based on RED. The RED+ALIGN classifier combines RED with an alignment algorithm, and RED+SVM combines RED with a support vector machine (SVM) classifier. Two clinical datasets were used for testing and evaluation: the SMOKE dataset, containing 1091 text snippets describing smoking status; and the PAIN dataset, containing 702 snippets describing pain status. We performed 10-fold cross-validation to calculate accuracy, precision, recall, and F-measure metrics. In the evaluation, an SVM classifier was trained as the control. The two RED classifiers achieved 80.9-83.0% in overall accuracy on the two datasets, which is 1.3-3% higher than SVM's accuracy (p<0.001). Similarly, small but consistent improvements have been observed in precision, recall, and F-measure when RED classifiers are compared with SVM alone. More significantly, RED+ALIGN correctly classified many instances that were misclassified by the SVM classifier (8.1-10.3% of the total instances and 43.8-53.0% of SVM's misclassifications). Machine-generated regular expressions can be effectively used in clinical text classification. The regular expression-based classifier can be combined with other classifiers, like SVM, to improve classification performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning regular expressions for clinical text classification

Abstract

Talk to us

Similar Papers

More From: Journal of the American Medical Informatics Association

Lead the way for us

Journal: Journal of the American Medical Informatics Association	Publication Date: Feb 27, 2014
Citations: 90

Similar Papers

An Effective Nuclear Extraction Mask Method for SVM Classification
Qinghua Li ... Chao Feng
-
Qinghua Li, et. al.Qinghua Li ... Chao Feng
01 May 2020
01 May 2020

Wavelet based Extraction of Features from EEG Signals and Classification of Novel Emotion Recognition Using SVM and HMM Classifier and to Measure its Accuracy
M Mohanambal ... Dr.P Vishnu Vardhan
Alinteri Journal of Agriculture Sciences | VOL. 36
M Mohanambal, et. al.M Mohanambal ... Dr.P Vishnu Vardhan
29 Jun 2021
Alinteri Journal of Agriculture Sciences | VOL. 36

Big Data Classification Using the SVM Classifiers with the Modified Particle Swarm Optimization and the SVM Ensembles
Liliya Demidova ... Yulia Sokolova
International Journal of Advanced Computer Science and Applications | VOL. 7
Liliya Demidova, et. al.Liliya Demidova ... Yulia Sokolova
01 Jan 2015
International Journal of Advanced Computer Science and Applications | VOL. 7

An inexact penalty method for the semiparametric Support Vector Machine classifier
D Lai ... M Palaniswami
-
D Lai, et. al.D Lai ... M Palaniswami
01 Jan 2006
01 Jan 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning regular expressions for clinical text classification

Abstract

Talk to us

Similar Papers

More From: Journal of the American Medical Informatics Association