Objective:Recent research has found that machine learning based analysis of patient speech can be used to classify Alzheimer’s Disease. We know of no studies, however, which systematically explore the value of pausing events in speech for detecting cognitive limitations. Using retrospectively acquired voice data from paragraph memory tests, we created two types of pause features: a) the number and duration of pauses, and b) frequency components in speech immediately following pausing. Multiple machine learning models were used to assess how these features could effectively discriminate individuals classified into two groups: Cognitively Compromised versus Cognitively Well.Participants and Methods:Participants (age> 65 years, n= 67) completed the Newcomer paragraph memory test and a neuropsychological protocol as part of a federally funded prospective IRB approved investigation at the University of Florida. Participant vocal recordings were acquired for the immediate and delay conditions of the test. Speaker diarization was performed on the immediate free recall test condition to separate voices of patients from examiners. Features extracted from both test conditions included a) 3 pause characteristics (total number of pauses, total pause duration, and length of the longest pause), and b) 20 Mel Frequency Cepstral Coefficients (MFCC) pertaining to speech immediately (2.7 seconds) following pauses. These were combined with demographics (age, sex, race, education, and handedness) to create a total of 105 features that were used as inputs for multiple machine learning analytic models (random forest, logistic regression, naive Bayes, AdaBoost, Gradient Boost, and multi-layered perceptron). External neuropsychological metrics were used to initially classify Cognitively Compromised (i.e., < -1.0 standard deviation on > two of five test metrics: total immediate, delay, discrimination Hopkins Verbal Learning Test-Revised (HVLT-R),Controlled Oral Word Association (COWA) test, category fluency ('animals')). Pearson Product Moment Correlations were used to assess the linear relationships between pauses and speech frequency categories and neuropsychological metrics.Results:Neuropsychology metric classification using -1SD cut-off identified 27% (18/67 participants) as Cognitively Compromised. The Cognitively Compromised group and the Cognitively Well group did not show any difference in distributions of individual pause/frequency features (Mann Whitney U-test, p> 0.11). A negative correlation was found between total duration of short pauses and HVLT total immediate free recall, while a positive correlation was found between MFCC-10 and HVLT total immediate free recall. The best classification model was AdaBoost Classifier which predicted the Cognitively Compromised label with 0.91 area under receiver operating curve, 0.81 accuracy, 0.43 sensitivity, 1.0 specificity, 1.0 precision, 0.6 f1 score.Conclusions:Pause characteristics and frequency profiles of speech immediately following pauses from a paragraph memory test accurately identified older adults with compromised cognition, as measured by verbal learning and verbal fluency metrics. Furthermore, individuals with reduced HVLT immediate free recall generated more pauses, while individuals who recalled more words had higher power in mid-frequency bands (10th MFCC). Future research needs to replicate how paragraph recall pause characteristics and frequency the profile of speech immediately following pauses potentially provides a low resource alternative to automatic speech recognition models for detecting cognitive impairments.
Read full abstract