Abstract
Introduction: Stroke research using widely available institutional, state-wide and national retrospective data is dependent on accurate identification of stroke subtypes using claims data. Despite the abundance of such data and the advances in clinical informatics, there is limited published data on the application of machine learning models to improve previously reported administrative stroke identification algorithms. Hypothesis: We hypothesized that machine learning models can be applied to claims data coded using the International Classification of Disease, version 9 (ICD-9), to accuracy identify patients with ischemic stroke (IS), intracerebral hemorrhage (ICH), and subarachnoid hemorrhage (SAH), and these models would outperform previously published algorithms in our patient cohort. Methods: We developed a gold standard list of 427 stroke patients continuously admitted to our institution from 1/1/2015 to 9/30/2015 using an internal stroke database and applied 75% of it to train and 25% to test two machine learning models: one using classification and regression tree (CART) and another using regularized logistic regression. There were 2,241 negative controls. We further applied a previously reported stroke detection algorithm, by Tirschwell and Longstreth, to our cohort for comparison. Results: The CART model had a κ of 0.72, 0.82, 0.59; sensitivity of 95%, 99%, 99%; and a specificity of 88%, 78%, 75%; for IS, ICH and SAH respectively. The regularized logistic regression model had a κ of 0.73, 0.80, 0.59; sensitivity of 95%, 99%, 99%, and a specificity of 89%, 78%, 75%; for IS, ICH and SAH respectively. The previously reported algorithm by Tirschwell et al, had a κ of 0.71,0.56, 0.64; sensitivity of 98%, 99%, 99%; and a specificity of 64%, 52%, 50%; for IS, ICH and SAH. Conclusion: Compared with the previously reported ICD 9 based detection algorithm, the machine learning models had a higher κ for diagnosis of IS and ICH, similar sensitivity for all subtypes, and higher specificity for all stroke subtypes in our cohort. Applying machine learning models to identify stroke subtypes from administrative data sets, can lead to highly accurate models of stroke subtype identification for health services researchers.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.