Abstract

Examination of speech datasets for detecting dementia, collected via various speech tasks, has revealed links between speech and cognitive abilities. However, the speech dataset available for this research is extremely limited because the collection process of speech and baseline data from patients with dementia in clinical settings is expensive. In this paper, we study the spontaneous speech dataset from a recent ADReSS challenge, a Cookie Theft Picture (CTP) dataset with balanced groups of participants in age, gender, and cognitive status. We explore state-of-the-art deep transfer learning techniques from image, audio, speech, and language domains. We envision that one advantage of transfer learning is to eliminate the design of handcrafted features based on the tasks and datasets. Transfer learning further mitigates the limited dementia-relevant speech data problem by inheriting knowledge from similar but much larger datasets. Specifically, we built a variety of transfer learning models using commonly employed MobileNet (image), YAMNet (audio), Mockingjay (speech), and BERT (text) models. Results indicated that the transfer learning models of text data showed significantly better performance than those of audio data. Performance gains of the text models may be due to the high similarity between the pre-training text dataset and the CTP text dataset. Our multi-modal transfer learning introduced a slight improvement in accuracy, demonstrating that audio and text data provide limited complementary information. Multi-task transfer learning resulted in limited improvements in classification and a negative impact in regression. By analyzing the meaning behind the AD/non-AD labels and Mini-Mental State Examination (MMSE) scores, we observed that the inconsistency between labels and scores could limit the performance of the multi-task learning, especially when the outputs of the single-task models are highly consistent with the corresponding labels/scores. In sum, we conducted a large comparative analysis of varying transfer learning models focusing less on model customization but more on pre-trained models and pre-training datasets. We revealed insightful relations among models, data types, and data labels in this research area.

Highlights

  • The number of patients with Alzheimer’s Disease (AD) over the age of 65 is expected to reach 13.8 million by 2050, leading to a huge demand on the public health system (Alzheimer’s Association, 2020)

  • To apply Speech BERT to our AD classification task, the output of the Speech BERT is fed into a 1D convolutional layer that convolutes through time dimension, fed into a global average pooling layer to obtain the average through time dimension, and fed into an fully connected (FC) layer and a softmax activation layer

  • Relation between Mini-Mental State Examination (MMSE) regression and AD classification: Given the Alzheimer’s Dementia Recognition through Spontaneous Speech (ADReSS) dataset, we explored a threshold-based strategy to understand the relation between the MMSE scores and AD/nonAD labels

Read more

Summary

INTRODUCTION

The number of patients with Alzheimer’s Disease (AD) over the age of 65 is expected to reach 13.8 million by 2050, leading to a huge demand on the public health system (Alzheimer’s Association, 2020). Koo et al (2020) and Pompili et al (2020) employed transfer learning techniques to extract both acoustic and linguistic features from pre-trained models, combined these features with handcrafted features, and customized a convolutional recurrent neural network to perform the downstream tasks. Their customized network architectures, though different in detail, produced similar results and conclusions. We investigated whether two downstream tasks are highly correlated and whether integrated training can reinforce the performance of the two tasks

SPEECH DATASET FOR DEMENTIA RESEARCH
Image Dataset
Audio Dataset
Speech Dataset
Text Dataset
DEEP TRANSFER LEARNING MODEL
Supervised Classification Approach
Customizing model for the downstream task
Self-Supervised Learning Approach
MULTI-MODAL TRANSFER LEARNING
MULTI-TASK TRANSFER LEARNING
PERFORMANCE EVALUATION
Implementation Details
Training Strategy
Evaluation Metrics
Evaluation of Deep Transfer Learning
Evaluation of Multi-Modal Transfer
Evaluation of Multi-Task Transfer
Summary of Best Cases Using Transfer
CONCLUSIONS
Multi-Modal Transfer Learning Reveals
Multi-Task Transfer Learning Reveals
DATA AVAILABILITY STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call