Optimising speech‐testing to predict prodromal Alzheimer’s disease: head‐to‐head comparison study of tasks and analysis methods

Udeepa Meepegama,Jack Weston,Michael T Ropacki,Emil Fristed,Caroline Skirrow

doi:10.1002/alz.080219

Abstract

AbstractBackgroundChanges in speech occur in early‐stage Alzheimer’s disease. A range of approaches have been used for eliciting and automatically analysing speech, but limited research has directly compared these methods.MethodParticipants from the AMYPRED‐UK (NCT04828122) and AMYPRED‐US (NCT04928976) studies completed four speech‐elicitation tasks: the Automatic Story Recall Task (ASRT), the Logical Memory Test (LMT), Semantic (Animals, Vegetables, Fruit) and Phonemic Verbal Fluency (F, A, S) Tasks. Responses were recorded and automatically transcribed. Analyses were completed with Novoic’s speech analysis software, using four key approaches: (1) feature extraction, (2) representations from large language models (LLM), (3) text‐similarity evaluation, and (4) autoscoring. Together, these evaluated audio, linguistic, and temporal speech domains. Outputs were entered into logistic regression models predicting Mild Cognitive Impairment (MCI)/mild AD and Amyloid positivity in MCI/mild AD. Area under the ROC curves (AUC) were evaluated via 5‐fold cross‐validation.Result165 older adults (including N = 74 MCI and 9 mild AD; of which N = 38 MCI/mild AD and amyloid beta positive) provided data for all tasks. Strength of model predictions varied by task, analytic approach and domain (Fig 1). AUCs predicting MCI/mild AD were generally higher for ASRTs compared to other tasks (Fig 1A). Text‐similarity approaches (G‐match: similarity of word embeddings between the source text and retelling, and V‐match: string‐based matching), produced the highest AUC for ASRTs (G‐match AUC = 0.87‐0.88), and the LMT (V‐match AUC = 0.83‐85). Autoscoring tasks produced the strongest predictor for the semantic fluency (AUC = 0.85), and also one of the strongest for phonemic fluency (AUC = 0.72). Changes in linguistic and temporal characteristics were more sensitive to MCI/mild AD, compared to audio‐only metrics. Amyloid in MCI/mild AD predictions were overall less strong (Fig 1B), with the most consistent pattern being better performance of text‐similarity approaches on story recall tasks (up to AUC = 0.77 for ASRTs, 0.73 for LMT).ConclusionSelection of task and analytic approach is important for developing more sensitive speech‐based testing. The best performing tasks (ASRT, Semantic Fluency) and metrics (text‐similarity metrics and autoscoring) have been incorporated in Novoic’s Storyteller automated remote speech‐based test.

Full Text