BackgroundPrior studies characterizing worsening heart failure events (WHFE) have been limited in using structured healthcare data from hospitalizations, and with little exploration of sociodemographic variation. The current study examined the impact of incorporating unstructured data to identify WHFE, describing age-, sex-, race and ethnicity-, and left ventricular ejection fraction (LVEF)-specific rates. MethodsAdult members of Kaiser Permanente Southern California (KPSC) with a HF diagnosis between 2014-2018 were followed through 2019 to identify hospitalized WHFE. The main outcome was hospitalizations with a principal or secondary HF discharge diagnosis meeting rule-based Natural Language Processing (NLP) criteria for WHFE. In comparison, we examined hospitalizations with a principal discharge diagnosis of HF. Age-, sex-, and race and ethnicity-adjusted rates per 100 person-years (PY) were calculated among age, sex, race and ethnicity (non-Hispanic (NH) Asian/Pacific Islander [API], Hispanic, NH Black, NH White) and LVEF subgroups. ResultsAmong 44,863 adults with HF, 10,560 (23.5%) had an NLP-defined, hospitalized WHFE. Adjusted rates (per 100 PY) of WHFE using NLP were higher compared to rates based only on HF principal discharge diagnosis codes (12.7 and 9.3, respectively), and this followed similar patterns among subgroups, with the highest rates among adults ≥75 years (16.3 and 11.2), men (13.2 and 9.7), and NH Black (16.9 and 14.3) and Hispanic adults (15.3 and 11.4), and adults with reduced LVEF (16.2 and 14.0). Using NLP disproportionately increased the perceived burden of WHFE among API and adults with mid-range and preserved LVEF. ConclusionRule-based NLP improved the capture of hospitalized WHFE above principal discharge diagnosis codes alone. Applying standardized consensus definitions to EHR data may improve understanding of the burden of WHFE and promote optimal care overall and in specific sociodemographic groups.