A Pilot Study to Improve the Use of Electronic Health Records for Identification of Patients with Social Determinants of Health Challenges: A Collaboration of Johns Hopkins Health System and Kaiser Permanente

Elham Hatef,Claudia Nau,Fagen Xie,Masoud Rouhizadeh,Lindsay Joe Lyons,Douglas Roblin,Mahmoud Abu‐Nasser,Ariadna Padilla,Christopher Rouillard

doi:10.1111/1475-6773.13756

Abstract

Research ObjectiveInternational Classification of Diseases (ICD) coding system have codes for recording of social determinants of health (SDOH); however, documentation of non‐clinical issues in electronic health records (EHRs) is infrequent compared to medical conditions. ICD codes in EHRs for SDOH identification, therefore, may under‐report patients with social needs and risks, which makes it difficult for healthcare systems to target “high risk” patients for interventions addressing social needs.SDOH may be discussed with healthcare providers during visits and, therefore, recorded in EHR free‐text notes (a.k.a, providers' notes). These notes might provide a more accurate accounting of SDOH; however, traditional approaches for review and abstraction of patient information from medical record notes is laborious, expensive, and slow. Recent developments in text mining and natural language processing (NLP) of digitized text allows for reliable, low cost, and rapid extraction of information from EHRs.In this pilot project we evaluated whether an NLP algorithm could extract valid measures of SDOH from Epic‐based EHRs in three healthcare systems: Johns Hopkins Health System (JHHS), Kaiser Permanente Mid‐Atlantic States (KPMAS), and KP Southern California (KPSCcal). The focus of our study was residential instability (i.e., homelessness and housing insecurity).Study DesignThe study was conducted independently, in a parallel and coordinated framework across sites. The validation assessment and NLP algorithm logic were identical across sites; however, the “gold standard” for assessment of algorithm validity differed according to data availability.Using the EntityRuler module of spaCy 2.3 Python toolkit, we created a rule‐based NLP system made up of 61 expert‐developed patterns that, if present, would represent residential instability. Our patterns included word ‘lemmas’ and base forms to account for morphological variations (e.g., singular and plural forms) as well as substitutions of different prepositions (e.g., about and for), and synonym words (e.g., house, apartment, and home).We calibrated and then validated the algorithm using a split sample approach. Validity was assessed at each site by measures of sensitivity and specificity.Population StudiedBeneficiaries ≥18 years of age during 2016 through 2019 who received care at JHHS, KPMAS, KPSCal.Principal FindingsThe following table presents the characteristics of the study population and performance of the NLP algorithm at each study site. JHHS KPMAS KPScal Study Population (Patient No.) ~1,200,000 ~1,600,000 ~4,700,000 NLP Validation Gold Standard Method SDOH Questionnaire SDOH Questionnaire SDOH ICD codes Manual Annotation Sample Size Patients/ Response No. (with/without residential Instability) 1000 (500+/ 500‐) 8197 (833+,7364‐) 300 (150+/150‐) Clinical Note No. 134,062 78,825 9575 NLP Algorithm Performance Sensitivity 0.84 0.61 0.96 Specificity 0.96 0.87 0.97 ConclusionsThe consistent performance of this NLP algorithm to identify residential instability in three different healthcare systems suggests the algorithm is generalizable. The consistent and relatively high sensitivity and specificity demonstrates the algorithm's validity.Implications for Policy or PracticeDevelopment of generalizable NLP algorithms with promising performance will enhance the value of EHRs to identify at risk patients across different health systems, to improve patient care and outcomes, and to mitigate socioeconomic disparities across individuals and communities.Primary Funding SourceJohns Hopkins and Kaiser Permanente Research Collaboration Committee Pilot Awards.

Full Text