Objective:Cognitive tests requiring spoken responses, such as paragraph recall, are rich in cognitive-related information that is not captured using traditional scoring methods. This study aimed to determine if linguistic features embedded in spoken responses may differentiate between individuals who are and are not cognitively impaired.Participants and Methods:Participants in the Long Life Family Study completed a neuropsychological assessment which included the WMS-R Logical Memory I paragraph recall. For a subset of participants (N=709), test responses were digitally recorded and manually transcribed. We used Linguistic Inquiry Word Count, a text analysis program, to quantify word counts, grammatical features (e.g, prepositions, verb tenses), and the use of content words related to specific semantic categories (e.g., work-related, numbers) for immediate (IR) and delayed recall (DR). We used regression models with Generalized Estimating Equations adjusted by age, sex, education, and within-family correlation to select features associated with cognitive status (normal cognition [NC] versus cognitive impairment [CI]; Bonferroni-corrected threshold p<0.001). Next, we developed a “polyfeature score” (PFS) for both immediate and delayed recall, each calculated as a weighted sum of the selected linguistic features. We then built a logistic regression model to evaluate the predictive value of each PFS for identifying cognitively impaired individuals. In secondary analyses, we used regression models as above to identify features associated with mild cognitive impairment subtype (amnestic [aMCI] versus nonamnestic [naMCI]; threshold p< .05).Results:The sample included 599 participants with NC and 110 with CI (mean age = 72.3 ± 11.0 years, 54% female). The regression identified 8 linguistic features for IR and 7 for DR that significantly predicted cognitive status. Decreased use of content words related to work (e.g., employed, school, police) and biological processes (e.g., cook, cafeteria, eat) and the use of negations (e.g., no, not, can’t) were predictive of cognitive impairment in both recall conditions. In contrast, the use of other content word categories were predictive of cognitive status in only one recall condition (IR: leisure, cognitive processes, space; DR: drives, number). The use of fewer prepositions in IR, more first-person pronouns in DR, and fewer words in the past tense in DR were each associated with cognitive impairment. Word count was not predictive of cognitive status. Both PFSs were highly associated with cognitive status (PFS_IR ß= 0.74, p< 0.001; PFS_DR ß= 0.86, p= 0.001) with high discriminative value (PFS_IR AUC= 0.93, sensitivity = 0.81, specificity= 0.91; PFS_DR AUC= 0.95, sensitivity= 0.77, specificity= 0.88). In the CI subset, linguistic features differed between those classified as aMCI (n= 24) and naMCI (n= 40). Two function word categories predicted aMCI in IR whereas decreased word count, two function word categories, and two content word categories predicted aMCI in DR (all p< .05)Conclusions:Linguistic features from paragraph recall provide high predictive value for classifying cognitive status increasing its potential as a cognitive screener in clinical settings. Additionally, each recall condition identified unique linguistic features associated with cognitive impairment which may aid differentiation of cognitive impairment subtypes and elucidate processes underlying deficits in learning and recall.
Read full abstract