Alcohol use carries significant morbidity and mortality, yet accurate identification of alcohol use disorder (AUD) remains a multi-layered problem for both researchers and clinicians. To fine-tune a language model to AUD in the clinical narrative and to detect AUDs not accounted for by ICD-9 coding in the MIMIC-III database. We applied clinicalBERT to unique patient discharge summaries. For classification, patients were divided into nonoverlapping groups stratified by the presence/absence of AUD ICD diagnosis for model training (80%), validation (10%), and testing (10%). For detection, the model was trained (80%) and validated (20%) on 1:1 positive/negative patients, then applied to remaining negative patient population. Physicians adjudicated 600 samples from the full model confidence spectrum to confirm AUD by Diagnostic and Statistical Manual of Mental Disorders-V criteria. The model exhibited the following characteristics (mean, standard deviation): precision (0.9, 0.02), recall (0.65, 0.03), F-1 (0.75, 0.02), area under the receiver operating curve (0.97, 0.01), and area under the precision-recall curve (0.86, 0.01). Adjudication produced an estimated 4% under-documentation rate for the total study population. As model confidence increased, AUD under-documentation rate rose to 30% of the number of patients identified as positive by ICD-9 coding. Our model improves the identification of patients meeting AUD criteria, outperforming ICD codes in detecting cases of AUD. Detection discrepancy between ICD and free-text highlights clinician under documentation, not under recognition. Adjudication revealed model over-sensitivity to language around substance use, withdrawal, and chronic liver disease; future study requires application to a broader set of patient age and acuity. This model has the potential to improve rapid identification of patients with AUD and enhance treatment allocation.
Read full abstract