AbstractBackgroundCognitive tests that require a spoken response, such as paragraph recall, are rich in cognitive‐related information that is not captured using traditional scoring methods. This study aimed to determine whether linguistic features embedded in spoken responses can differentiate between individuals who are and are not cognitively impaired.MethodWe used linguistic text analysis software (Linguistic Inquiry Word Count) to identify lexical and grammatical features from manually transcribed recordings of Logical Memory immediate (IR) and delayed recall (DR) from a subset (N = 169) of Long Life Family Study participants. We used regression models with Generalized Estimating Equations adjusted by age, sex, education, and within‐family correlation to select features associated with cognitive status (normal cognition [NC] versus cognitive impairment [CI]; threshold p<0.1). We developed a “polyfeature score” (PFS) for both immediate and delayed recall, each calculated as a weighted sum of the selected lexical features for each recall condition. We then built a logistic regression model to evaluate the predictive value of the PFSs for identifying cognitively impaired individuals.ResultThe sample included 133 participants with NC and 36 with CI (mean age = 75.8±11.7 years, 56% female). The regression identified 11 features for IR (e.g., percent of prepositions, words > = 6 letters) and 13 features for DR (e.g., percent of perceptual process words, percent of conjunctions) that significantly predicted cognitive status. Only 45% of features selected in IR were also selected in DR. Both PFSs were highly associated with cognitive status (PFS_IR β = 1.48, p < 0.001; PFS_DR β = 0.67, p = 0.001). The PFS yielded high discriminative value for cognitive status (PFS_IR AUC = 0.96, sensitivity = 0.73, specificity = 0.86; PFS_DR AUC = 0.90, sensitivity = 0.82, specificity = 0.86). Combining lexical features from both recall conditions did not increase the predictive value of the models.ConclusionLexical features from immediate paragraph recall alone provide sufficient information for classifying cognitive status increasing its potential as a cognitive screener in clinical settings. Additionally, each recall condition identified unique lexical features associated with cognitive impairment which may help elucidate processes underlying deficits in learning and recall.