Comparison of 2 Natural Language Processing Methods for Identification of Bleeding Among Critically Ill Patients

Maxwell Taggart,Rashmee U Shah,Donald M Lloyd-Jones,Brian T Bucher,Yishuai Du,Shane Ruckel,Matthew T Rondina,Wendy W Chapman,Arianna Pregenzer-Wenzler,Jeffrey Ferraro,Benjamin A Steinberg

doi:10.1001/jamanetworkopen.2018.3451

Abstract

To improve patient safety, health care systems need reliable methods to detect adverse events in large patient populations. Events are often described in clinical notes, rather than structured data, which make them difficult to identify on a large scale. To develop and compare 2 natural language processing methods, a rules-based approach and a machine learning (ML) approach, for identifying bleeding events in clinical notes. This diagnostic study used deidentified notes from the Medical Information Mart for Intensive Care, which spans 2001 to 2012. A training set of 990 notes and a test set of 660 notes were randomly selected. Physicians classified each note as present or absent for a clinically relevant bleeding event during the hospitalization. A bleeding dictionary was developed for the rules-based approach; bleeding mentions were then aggregated to arrive at a classification for each note. Three ML models (support vector machine, extra trees, and convolutional neural network) were developed and trained using the 990-note training set. Another instance of each ML model was also trained on a sample of 450 notes, with equal numbers of bleeding-present and bleeding-absent notes. The notes were represented using term frequency-inverse document frequency vectors and global vectors for word representation. The main outcomes were accuracy, sensitivity, specificity, positive predictive value, and negative predictive value for each model. Following training, the models were tested on the test set and sensitivities were compared using a McNemar test. The 990-note training set represented 769 patients (296 [38.5%] female; mean [SD] age, 67.42 [14.7] years). The 660-note test set represented 527 patients (211 [40.0%] female; mean [SD] age, 67.86 [14.7] years). Bleeding was present in 146 notes (22.1%). The extra trees down-sampled model and rules-based approaches were similarly sensitive (93.8% vs 91.1%; difference, 2.7%; 95% CI, -3.8% to 7.9%; P = .44). The positive predictive value for the extra trees model, however, was 48.6%. The rules-based model had the best performance overall, with 84.6% specificity, 62.7% positive predictive value, and 97.1% negative predictive value. Bleeding is a common complication in health care, and these results demonstrate an automated and scalable detection method. The rules-based natural language processing approach, compared with ML, had the best performance in identifying bleeding, with high sensitivity and negative predictive value.

Highlights

Bleeding is a common complication in health care and is associated with increased morbidity, mortality, and health care costs.[1,2] Big data approaches, incorporating genetic, sociodemographic, medical, and environmental exposures, provide an opportunity to create powerful models to predict which patients are likely to bleed
The extra trees down-sampled model and rules-based approaches were sensitive (93.8% vs 91.1%; difference, 2.7%; 95% CI, −3.8% to 7.9%; P = .44)
Bleeding is a common complication in health care, and these results demonstrate an automated and scalable detection method

Summary

Introduction

Bleeding is a common complication in health care and is associated with increased morbidity, mortality, and health care costs.[1,2] Big data approaches, incorporating genetic, sociodemographic, medical, and environmental exposures, provide an opportunity to create powerful models to predict which patients are likely to bleed. The goal of this study was to develop and compare different natural language processing (NLP) algorithms for accurately identifying bleeding events in clinical notes. Clinicians read the medical record to find out whether their patients bled, but this approach cannot be scaled across entire health care systems. With NLP, the computer can “read” the note and extract bleeding events in a computable form for use in predictive models, quality and safety monitoring, and clinical decision support systems

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JAMA Network Open	Publication Date: Oct 12, 2018
Citations: 38	License type: cc-by

R Discovery Prime

R Discovery Prime

Comparison of 2 Natural Language Processing Methods for Identification of Bleeding Among Critically Ill Patients

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JAMA Network Open

Lead the way for us

Similar Papers

Abstract 11: Development and Comparison of Two Natural Language Processing Methods for Identifying Bleeding Events in Clinical Text
Max Taggart ... Rashmee U Shah
Circulation: Cardiovascular Quality and Outcomes | VOL. 11
Max Taggart, et. al.Max Taggart ... Rashmee U Shah
01 Apr 2018
Circulation: Cardiovascular Quality and Outcomes | VOL. 11

Construction and validation of machine learning models for sepsis prediction in patients with acute pancreatitis
Fei Liu ... Songtao Shou
BMC Surgery | VOL. 23
Fei Liu, et. al.Fei Liu ... Songtao Shou
01 Sep 2023
BMC Surgery | VOL. 23

Sarcasm detection in online comments using machine learning
Daniel Šandor ... Marina Bagić Babac
Information Discovery and Delivery | VOL. 52
Daniel Šandor, et. al.Daniel Šandor ... Marina Bagić Babac
31 Jul 2023
Information Discovery and Delivery | VOL. 52

Development and validation of a novel blending machine learning model for hospital mortality prediction in ICU patients with Sepsis
Zhixuan Zeng ... Jianfei Zheng
BioData Mining | VOL. 14
Zhixuan Zeng, et. al.Zhixuan Zeng ... Jianfei Zheng
16 Aug 2021
BioData Mining | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparison of 2 Natural Language Processing Methods for Identification of Bleeding Among Critically Ill Patients

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JAMA Network Open