Linguistic markers of Alzheimer’s in Spanish speakers: Automated metrics for free speech and verbal fluency tasks

Adolfo M Garcia

doi:10.1002/alz.062609

Abstract

AbstractBackgroundAutomated speech analysis can reveal objective, scalable markers of Alzheimer’s disease (AD). Yet, few studies have targeted Spanish, the most spoken language in Latin America, where AD is quickly escalating. Also, most research presents low interpretability, targeting linguistic features unrelated to the disorder’s core neuropsychological profile. Working with Hispanics, we have validated novel automated metrics capturing critical lexico‐semantic targets for AD: semantic granularity (coarseness of concepts) and ongoing semantic variability (conceptual closeness of successive words) –Figure 1A. I aim to describe and illustrate these metrics via two experiments comparing AD patients and healthy controls (HCs) on free speech and verbal fluency tasks. To test for specificity, replications were conducted in Parkinson’s disease (PD) and behavioral variant frontotemporal dementia cohorts.MethodIn Experiment‐1, 21 AD patients, 16 HCs, and 18 PD patients performed spontaneous and semi‐spontaneous speech tasks. In Experiment‐2, 32 AD patients, 27 HCs, 19 PD patients, and 32 bvFTD patients performed phonological and semantic fluency tasks. In both studies, responses were transcribed and lemmatized. Semantic granularity scores were computed on Python’s NLTK library to access WordNet, a lexical taxonomy leading from hypernyms (top node: ‘entity’) to hyponyms (e.g., ‘dog’). Each word’s granularity is defined as the distance between its node and ‘entity’ (Figure 1B). Ongoing semantic variability was analyzed with a FastText model pre‐trained with Spanish corpora. Each unique word in the vocabulary was assigned a vector, and the distance between words was quantified with the cosine of the angle between their assigned vectors (Figure 1C). Extracted features from both dimensions were analyzed through ANOVAs and XGBoost classifiers.ResultsIn Experiment‐1, relative to HCs, AD (but not PD) patients used more unspecific and fewer specific concepts, alongside more discontinuous conceptual choices. These features robustly classified between AD patients and HCs, but not between PD patients and HCs. In Experiment‐2, only AD patients exhibited reduced semantic granularity, which contributed to robust subject‐level classification. Ongoing semantic variability failed to distinguish among groups.ConclusionThese novel digital metrics can reveal objective, interpretable, and specific markers of AD in Spanish speakers, providing culture‐specific data for a vast but understudied linguistic community.

Full Text