Background and Aims: Although metabolic dysfunction-associated steatotic liver disease (MASLD) and MASLD with increased alcohol intake (MetALD) are identified as clinical entities, tools to identify patients from electronic health records (EHRs) to perform large outcome studies are lacking. Methods: In this retrospective study of participants from the Veterans Analysis of Liver Disease (VALID) cohort assembled from 1/1/2013 to 12/31/2022, a rule-based natural language processing (NLP) algorithm searched EHRs for imaging evidence of hepatic steatosis. This was combined with identification of cardiometabolic risk factors (CMRF) and harmful alcohol use. Algorithm-derived diagnoses of MASLD, MetALD, alcohol-associated steatotic liver disease (ALD), and no steatotic liver disease (SLD) were validated using a blinded review of randomly selected charts. Results: Among 817,657 eligible Veterans, SLD was present in over half (n=438,209, 53.5%), including MASLD in 299,259 (36.5%), 99,163 with MetALD (12.1%), 38,552 (4.7%) with ALD. The NLP algorithm had a high correlation with steatosis on chart review, with a kappa of 0.86 (95% CI 0.82-0.90), sensitivity of 0.96 and specificity of 0.90. Classification of MASLD, MetALD, ALD and no SLD by the algorithm also showed high correlation with chart review, with a kappa of 0.87 (95% CI 0.82-0.91). This algorithm identified 299,259 (36.5%) of the study cohort with MASLD, compared to 23,218 (2.8%) patients identified using ICD 9/10 codes. Conclusions and Relevance: An algorithm combining rule-based NLP with CMRFs and alcohol use from EHRs accurately identifies and classifies SLD and can be applied in large epidemiologic studies of SLD in the VHA.
Read full abstract