Abstract Disclosure: E.M. Everett: None. R. Tiu: None. B. Zhu: None. T. Moin: None. A.T. Bui: None. Background: Insulin pumps (pump) have many benefits to patients with type 1 diabetes compared to management with multiple daily injection regimens. Unfortunately, disparities exist in pump therapy access and use, particularly in racial-ethnic minoritized groups, those of lower socioeconomic status or limited English proficiency. Given that these groups are underrepresented in pump clinical trials, evaluating data from real-world populations such as those from large health systems is critical to improve technology use in these groups that also experience disparities in diabetes outcomes. Unfortunately, it is very challenging to identify diabetes technology users using electronic health record data due to the underutilization of ICD and procedure codes for pump use and the documentation of pump use in unstructured form in clinical notes, which is more difficult to extract outside of manual chart review. Consequently, a novel approach is needed to capture this population on a large scale. We aim to develop an algorithm using the machine learning method natural language processing (NLP), to identify these technology users and non-users in health systems, which would allow us to study pump use in real-world settings. Methods: To develop the algorithm, we performed chart review and cataloged documentation styles (keywords, phrases) indicating pump use from 88 clinical notes of adults with type 1 diabetes, authored by physicians and nurse educators in UCLA endocrinology clinics from 2014-2022. After reaching saturation for documentation styles, we used a series of regular expressions (Regex) to encode the algorithm, including code to negate phrases that may lead to false positives. Results: We tested the algorithm performance in an 85 note sample that included 59 pump users and 26 non-users. The algorithm detected 56/59 pump users and 26/26 non-pump users, yielding a sensitivity of 95% (95%CI 87-99%), a specificity of 100% (95%CI 86-100%) specificity, a positive predictive value of 100% (95%CI 94-100%), and a negative predictive value of 90% (95%CI 74-96%). Conversely, an approach using ICD and procedure codes only identified 17/59 pump users with a sensitivity of 28% (95%CI 18-42%), specificity of 100% (95%CI 87-00%), positive predictive value of 100% (95%CI 81-100%), and negative predictive value of 38% (95%CI 35-42%). Incrementally, the NLP algorithm identified ∼70% more pump users than the approach using billing codes. Conclusion: NLP is a promising approach to identify pump users using unstructured electronic health record data. Next steps include testing the current algorithm in a larger sample of clinical notes, revising the algorithm as needed. The final algorithm will be used to understand patterns and disparities of pump prescribing over time and will be leveraged as a recruitment tool to identify potentially eligible patients for future studies aimed at addressing pump disparities. Presentation: 6/3/2024
Read full abstract