Enhancing early detection of cognitive decline in the elderly: a comparative study utilizing large language models in clinical notes

Xinsong Du,John Novoa-Laurentiev,Joseph M Plasek,Ya-Wen Chuang,Liqin Wang,Gad A Marshall,Stephanie K Mueller,Frank Chang,Surabhi Datta,Hunki Paek,Bin Lin,Qiang Wei,Xiaoyan Wang,Jingqi Wang,Hao Ding,Frank J Manion,Jingcheng Du,David W Bates,Li Zhou

doi:10.1016/j.ebiom.2024.105401

Abstract

BackgroundLarge language models (LLMs) have shown promising performance in various healthcare domains, but their effectiveness in identifying specific clinical conditions in real medical records is less explored. This study evaluates LLMs for detecting signs of cognitive decline in real electronic health record (EHR) clinical notes, comparing their error profiles with traditional models. The insights gained will inform strategies for performance enhancement. MethodsThis study, conducted at Mass General Brigham in Boston, MA, analysed clinical notes from the four years prior to a 2019 diagnosis of mild cognitive impairment in patients aged 50 and older. We developed prompts for two LLMs, Llama 2 and GPT-4, on Health Insurance Portability and Accountability Act (HIPAA)-compliant cloud-computing platforms using multiple approaches (e.g., hard prompting, retrieval augmented generation, and error analysis-based instructions) to select the optimal LLM-based method. Baseline models included a hierarchical attention-based neural network and XGBoost. Subsequently, we constructed an ensemble of the three models using a majority vote approach. Confusion-matrix-based scores were used for model evaluation. FindingsWe used a randomly annotated sample of 4,949 note sections from 1,969 patients (women: 1,046 [53.1%]; age: mean, 76.0 [SD, 13.3] years), filtered with keywords related to cognitive functions, for model development. For testing, a random annotated sample of 1,996 note sections from 1,161 patients (women: 619 [53.3%]; age: mean, 76.5 [SD, 10.2] years) without keyword filtering was utilised. GPT-4 demonstrated superior accuracy and efficiency compared to Llama 2, but did not outperform traditional models. The ensemble model outperformed the individual models in terms of all evaluation metrics with statistical significance (p<0.01), achieving a precision of 90.2% [95% CI: 81.9%-96.8%], a recall of 94.2% [95% CI: 87.9%-98.7%], and an F1-score of 92.1% [95% CI: 86.8%-96.4%]. Notably, the ensemble model showed a significant improvement in precision, increasing from a range of 70%-79% to above 90%, compared to the best-performing single model. Error analysis revealed that 63 samples were incorrectly predicted by at least one model; however, only 2 cases (3.2%) were mutual errors across all models, indicating diverse error profiles among them. InterpretationLLMs and traditional machine learning models trained using local EHR data exhibited diverse error profiles. The ensemble of these models was found to be complementary, enhancing diagnostic performance. Future research should investigate integrating LLMs with smaller, localised models and incorporating medical data and domain knowledge to enhance performance on specific tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enhancing early detection of cognitive decline in the elderly: a comparative study utilizing large language models in clinical notes

Abstract

Talk to us

Similar Papers

More From: eBioMedicine

Lead the way for us

Journal: eBioMedicine	Publication Date: Nov 1, 2024
Citations: 1

Similar Papers

Comparison of the Diagnostic Accuracy of Five Cognitive Screening Tests for Diagnosing Mild Cognitive Impairment in Patients Consulting for Memory Loss.
María Valles-Salgado ... Jorge Matías-Guiu
Journal of clinical medicine | VOL. 13
María Valles-Salgado, et. al.María Valles-Salgado ... Jorge Matías-Guiu
09 Aug 2024
Journal of clinical medicine | VOL. 13

NREM Sleep EEG Characteristics Correlate to the Mild Cognitive Impairment in Patients with Parkinsonism.
Cheng Zhang ... Jing Ma
BioMed Research International | VOL. 2021
Cheng Zhang, et. al.Cheng Zhang ... Jing Ma
24 Jul 2021
BioMed Research International | VOL. 2021

Increased serum levels of cyclophilin a and matrix metalloproteinase-9 are associated with cognitive impairment in patients with obstructive sleep apnea
Mengfan Li ... Zhangyong Xia
Sleep Medicine | VOL. 93
Mengfan Li, et. al.Mengfan Li ... Zhangyong Xia
21 Oct 2021
Sleep Medicine | VOL. 93

Increased detection of mild cognitive impairment with type 2 diabetes mellitus using the Japanese version of the Montreal Cognitive Assessment: A pilot study
Yukiko Mori ... Mitsuru Kawamura
Neurology and Clinical Neuroscience | VOL. 3
Yukiko Mori, et. al.Yukiko Mori ... Mitsuru Kawamura
11 Dec 2014
Neurology and Clinical Neuroscience | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancing early detection of cognitive decline in the elderly: a comparative study utilizing large language models in clinical notes

Abstract

Talk to us

Similar Papers

More From: eBioMedicine