Abstract WMP71: Using Machine Learning to Classify Ischemic Stroke Subtype From Electronic Heath Records

Ravi P Garg,Konrad P Kording,Elissa Oh,Andrew M Naidech,Shyam Prabhakaran,Marc B Rosenman

doi:10.1161/str.49.suppl_1.wmp71

Abstract

Objective: The TOAST classification for ischemic stroke (IS) is critical to determining management and predicting outcome. The adjudication process is done manually by highly trained stroke clinicians. This is time-consuming, error-prone, and limits scaling to large datasets. However, electronic medical records (EMR) could be leveraged to automate this process. We hypothesized that machine learning enabled natural language processing (NLP) for multiclass classification could determine the TOAST subtype from free text stored in the EMR. Methods: We selected 1099 IS patients from an observational registry with TOAST subtyping confirmed by board-certified vascular neurologists. We analyzed text-based EMR data including progress notes and radiology reports. For each patient, we concatenated notes into one large single document. We tokenized the results into a “bag of words” based representation using n-grams (unigrams, bigrams and trigrams). We did five-fold cross validation in order to avoid overfitting. To reduce the high dimensionality of features, we used principal component analysis (150 components) and L1 regularized logistic regression and then combined the features thus obtained within each fold. Next, several classification methods - K nearest neighbors, Support Vector Machines, Random Forests, Extra Trees classifiers, Gradient Boosting Machines, Xtra-Gradient Boosting and Stack ensembles - to assess the accuracy and discrimination of machine learning techniques for TOAST subtyping compared to manual subtyping (gold standard). We performed receiver operating characteristics analysis to assess discrimination of each model. Results: Our best classification method achieved an accuracy of 41 +/- 5% using radiology reports alone and 64 +/- 4% using progress notes alone. Combining radiology reports and progress notes, we achieved an accuracy of 66 +/- 5% with high discrimination (90 +/- 4%). Conclusions: Compared to manual approaches, automated machine learning and NLP can discriminate TOAST subtypes using EMR data with moderate accuracy and high discrimination. The automated pipeline, if validated, could enable large-scale stroke epidemiology research using EHRs nationwide.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Abstract WMP71: Using Machine Learning to Classify Ischemic Stroke Subtype From Electronic Heath Records

Abstract

Talk to us

Similar Papers

More From: Stroke

Lead the way for us

Similar Papers

Automating Ischemic Stroke Subtype Classification Using Machine Learning and Natural Language Processing
Ravi Garg ... Shyam Prabhakaran
Journal of Stroke and Cerebrovascular Diseases | VOL. 28
Ravi Garg, et. al.Ravi Garg ... Shyam Prabhakaran
15 May 2019
Journal of Stroke and Cerebrovascular Diseases | VOL. 28

Illustrating the patient journey through the care continuum: Leveraging structured primary care electronic medical record (EMR) data in Ontario, Canada using chronic obstructive pulmonary disease as a case study
Jennifer Rayner ... Chen Wu
International Journal of Medical Informatics | VOL. 140
Jennifer Rayner, et. al.Jennifer Rayner ... Chen Wu
19 May 2020
International Journal of Medical Informatics | VOL. 140

Identification of Frailty using EMR and Admin data: A complex issue
Alan Katz ... Sandra Peterson
International Journal of Population Data Science | VOL. 3
Alan Katz, et. al.Alan Katz ... Sandra Peterson
03 Sep 2018
International Journal of Population Data Science | VOL. 3

Abstract P068: A hybrid modelling approach for abstracting CT imaging indications by integrating natural language processing from radiology reports with structured data from electronic health records
Aparajita Khan ... Eunji Choi
Cancer Prevention Research | VOL. 16
Aparajita Khan, et. al.Aparajita Khan ... Eunji Choi
01 Jan 2023
Cancer Prevention Research | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Abstract WMP71: Using Machine Learning to Classify Ischemic Stroke Subtype From Electronic Heath Records

Abstract

Talk to us

Similar Papers

More From: Stroke