Word2Vec inversion and traditional text classifiers for phenotyping lupus

Clayton A Turner,Alexander D Jacobs,Paul E Anderson,Cassios K Marques,James C Oates,Diane L Kamen,Jihad S Obeid

doi:10.1186/s12911-017-0518-1

Abstract

BackgroundIdentifying patients with certain clinical criteria based on manual chart review of doctors’ notes is a daunting task given the massive amounts of text notes in the electronic health records (EHR). This task can be automated using text classifiers based on Natural Language Processing (NLP) techniques along with pattern recognition machine learning (ML) algorithms. The aim of this research is to evaluate the performance of traditional classifiers for identifying patients with Systemic Lupus Erythematosus (SLE) in comparison with a newer Bayesian word vector method.MethodsWe obtained clinical notes for patients with SLE diagnosis along with controls from the Rheumatology Clinic (662 total patients). Sparse bag-of-words (BOWs) and Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs) matrices were produced using NLP pipelines. These matrices were subjected to several different NLP classifiers: neural networks, random forests, naïve Bayes, support vector machines, and Word2Vec inversion, a Bayesian inversion method. Performance was measured by calculating accuracy and area under the Receiver Operating Characteristic (ROC) curve (AUC) of a cross-validated (CV) set and a separate testing set.ResultsWe calculated the accuracy of the ICD-9 billing codes as a baseline to be 90.00% with an AUC of 0.900, the shallow neural network with CUIs to be 92.10% with an AUC of 0.970, the random forest with BOWs to be 95.25% with an AUC of 0.994, the random forest with CUIs to be 95.00% with an AUC of 0.979, and the Word2Vec inversion to be 90.03% with an AUC of 0.905.ConclusionsOur results suggest that a shallow neural network with CUIs and random forests with both CUIs and BOWs are the best classifiers for this lupus phenotyping task. The Word2Vec inversion method failed to significantly beat the ICD-9 code classification, but yielded promising results. This method does not require explicit features and is more adaptable to non-binary classification tasks. The Word2Vec inversion is hypothesized to become more powerful with access to more data. Therefore, currently, the shallow neural networks and random forests are the desirable classifiers.

Highlights

Identifying patients with certain clinical criteria based on manual chart review of doctors’ notes is a daunting task given the massive amounts of text notes in the electronic health records (EHR)
EHR clinical notes Following Institutional Review Board for Human Research (IRB) approval, we acquired a dataset of 662 patients from the Rheumatology Clinic, 322 patients diagnosed with Systemic Lupus Erythematosus (SLE) and 340 controls
The results produced from these aforementioned methodologies are as follows: The algorithms with the best Area under the receiver operating characteristic curve (AUC) and accuracy, shown in Tables 2 and 3 are neural networks with Concept Unique Identifiers (CUIs), random forests with BOWs and CUIs, and the Word2Vec Bayesian inversion

Summary

Introduction

Identifying patients with certain clinical criteria based on manual chart review of doctors’ notes is a daunting task given the massive amounts of text notes in the electronic health records (EHR) This task can be automated using text classifiers based on Natural Language Processing (NLP) techniques along with pattern recognition machine learning (ML) algorithms. ICD codes, have been known to be prone to errors due to a variety of problems in the coding and billing workflows [1,2,3] This is problematic as clinicians use Classically, departing from ICD-9 codes would necessitate the usage of natural language processing (NLP) in order to extract features suitable for machine learning (ML) from the clinical notes in the EHR. The aim of this research is to evaluate combinations of NLP technique; and ML algorithms alongside a newer inversionbased method which utilizes word vectors based on Word2Vec, which does not have the problem of feature generation [4]

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC medical informatics and decision making	Publication Date: Aug 22, 2017
Citations: 55	License type: open-access

R Discovery Prime

R Discovery Prime

Word2Vec inversion and traditional text classifiers for phenotyping lupus

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC medical informatics and decision making

Lead the way for us

Similar Papers

1765. Use of a Natural Language Processing-Based Informatics Pipeline for Infectious Disease Syndrome Surveillance
Philip Zachariah ... Alexandra Hill-Ricciuti
Open forum infectious diseases | VOL. 5
Philip Zachariah, et. al.Philip Zachariah ... Alexandra Hill-Ricciuti
26 Nov 2018
Open forum infectious diseases | VOL. 5

77 Results of a Novel National Emergency Department Chief Complaint Database
J Seidenfeld ... A Vashi
Annals of Emergency Medicine | VOL. 80
J Seidenfeld, et. al.J Seidenfeld ... A Vashi
29 Sep 2022
Annals of Emergency Medicine | VOL. 80

Deployment of Real-time Natural Language Processing and Deep Learning Clinical Decision Support in the Electronic Health Record: Pipeline Implementation for an Opioid Misuse Screener in Hospitalized Adults.
Majid Afshar ... Frank Liao
JMIR medical informatics | VOL. 11
Majid Afshar, et. al.Majid Afshar ... Frank Liao
20 Apr 2023
JMIR medical informatics | VOL. 11

Prediction of severe chest injury using natural language processing from the electronic health record
Sujay Kulshrestha ... Majid Afshar
Injury | VOL. 52
Sujay Kulshrestha, et. al.Sujay Kulshrestha ... Majid Afshar
25 Oct 2020
Injury | VOL. 52

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Word2Vec inversion and traditional text classifiers for phenotyping lupus

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC medical informatics and decision making