AbstractBackgroundThe severity of Alzheimer’s disease and related dementias (ADRD) is mostly documented in unstructured textual data in electronic health records (EHR). This information is important for clinical decision‐making yet is often “hidden” in free text fields and then not as readily available as information in the structured fields for clinicians to act upon. This study assessed the feasibility and potential bias in using keywords and rules‐based matching for obtaining information about severity of ADRD from EHR.MethodWe used EHR data from a large academic healthcare system that included patients with a primary discharge diagnosis of ADRD based on ICD‐9/10 codes between 2014 and 2020. The severity of ADRD was determined using clinicians’ notes based on (1) scores from the Mini Mental State Examination and Montreal Cognitive Assessment, and (2) explicit terms for ADRD severity (e.g., “mild dementia”, “advanced AD”). A list of common ADRD symptoms, cognitive test names and diagnosis stage terms was compiled and iteratively refined based on prior literature and clinical expertise. We used the list together with rule‐based pattern matching to identify the context in which the word/phrase was mentioned. The algorithm was developed in python 3.8 using spaCy and pandas library. We assessed the prevalence of the documented ADRD severity and used logistic regression to examine whether the severity varies by patient characteristics.ResultA total of 9,115 patients with over 65k providers’ notes were evaluated. Overall, 16.85% (N = 1,536) of patients were documented with mild ADRD, 17.95% (N = 1,636) were documented with moderate or severe ADRD, and 65.20% (N = 5,942) did not have any documentation of the severity of their ADRD. Compared with patients with mild ADRD, those documented with more advanced ADRD were older, more likely to be female, black, and receive their diagnoses in a primary care or in‐hospital setting. Relative to patients with undocumented ADRD severity, those documented with ADRD severity had a similar distribution regarding sex, race, and rural/urban living environment.ConclusionThis study demonstrated the value of unstructured EHR data and the feasibility of using pattern matching algorithm in identifying severity of ADRD. Still, differences in the documentation may introduce bias in the algorithm.