Abstract

AbstractBackgroundLanguage deficits may occur in early stages of Alzheimer’s disease (AD). Analysis of individuals’ language changes has been shown to predict the diagnosis and severity of AD. This work explored machine learning methods to detect AD using voice recordings of the Cookie Theft picture description tasks.MethodAn open‐source dataset from Dementia Bank, which included manual transcripts and audio recordings of the Cookie Theft picture description task from 309 individuals with AD and 243 cognitively healthy older adults, was employed in the classification analysis. We analyzed transcripts generated using three different approaches (Figure 1), including a) automatically generated transcripts based on denoised voice recording; b) automatically generated transcripts without the denoising process; and c) manuscript transcripts. Denoising was implemented using noisereduce, a Python library to minimize background noise (CITATION). Automatic transcripts were produced using two toolkits: CMUSpinx pocketsphinx interpreter (PS) and Mozilla DeepSpeech (DS). The Small‐Bidirectional Encoder Representations from Transformers (BERT) model was implemented to extract high dimensional feature vectors to provide contextual representation of transcripts for classification analysis. The experiment was designed to compare the performance of transcript type (manual vs. automatic), recording quality (noise vs. denoise), as well as speech recognition methods (DS vs. PS). A neural network model was constructed to classify between those with AD and healthy controls. The classification analysis was conducted using 10‐fold cross validation. The classification was repeated 20 times and the averaged results were reported (see Table 1).ResultThe performance of each approach is shown as Table 1. Overall, the model that used the features extracted from DS transcripts with denoised speech recordings achieved the best performance which had 92.72% accuracy for AD diagnosis validation results. The comparison between raw and denoised speech recordings showed conflicting performances. Using features from PS transcripts with noisy speech recordings obtained better performance than the denoised one, which was different with features from DS transcripts.ConclusionThe analysis approach which combined transcription driven recognition and transfer learning indicated the potential of better performance in AD diagnosis. Further investigation is needed to understand the reason for its superior performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call