Abstract 11: Development and Comparison of Two Natural Language Processing Methods for Identifying Bleeding Events in Clinical Text

Max Taggart,Yishuai Du,Wendy W Chapman,Arianna Pregenzer-Wenzler,Jeffery Ferraro,Shane Ruckel,Brian T Bucher,Donald M Lloyd-Jones,Matthew T Rondina,Benjamin A Steinberg,Rashmee U Shah

doi:10.1161/circoutcomes.11.suppl_1.11

Abstract

Background: Learning healthcare systems need techniques that can accurately and automatically identify health outcomes in large populations. Outcomes are often described in clinical narration in the electronic medical record. Objective: To develop and compare two natural language processing (NLP) approaches, rules-based (RB) and machine-learning (ML), for identifying bleeding events in clinical notes. Methods: We used de-identified notes from the Medical Information Mart for Intensive Care. We randomly selected 990 notes for a training set and 660 notes for a test set. Physicians classified each note as present or absent for a clinically relevant bleeding event during the hospitalization. We developed a dictionary of target and modifier words for the RB approach. In RB, the computer “reads” the text and tags bleeding targets as present or absent based on the modifier words; the mentions are aggregated to arrive at a classification for the note. For the ML approach, each note was represented as a high-dimensional vector where each dimension corresponds to the frequency of a certain word. Similar notes (e.g. bleeding present notes) have similar vectors; the computer learns these patterns to predict the class for an unseen note. One RB and three ML models (support vector machine (SVM), extra trees (ET), convolutional neural network (CNN)) were trained using the full 990-note training set. Another instance of each ML model was also trained on a down-sampled (DS) set of 450 notes, with equal positive and negative notes. We ran the trained models on the 660-note test set and compared classification performance using McNemar’s test. Results: The 660 note test set represented 527 unique patients, 40% female. Bleeding events were present in 21% of the notes. The ET-DS model was the most sensitive, followed by the RB approach (93.8% versus 91.1%, p=0.44). The PPV value for the ET-DS model, however, was <50%. The RB had the best performance overall, with 84.6% specificity, 62.7% positive predictive value, and 97.1% negative predictive value (NPV) for identifying clinically relevant bleeding. Discussion: A RB NLP approach, compared to ML, has the best overall performance in independently identifying bleeding events among critically ill patients. The current models have high NPV, so could be used to reduce the chart review burden.

Full Text