Representation learning strategies to model pathological speech: Effect of multiple spectral resolutions

Gabriel Figueiredo Miller,Juan Rafael Orozco-Arroyave,Elmar Nöth,Juan Camilo Vásquez-Correa

doi:10.1016/j.csl.2023.101584

Gabriel Figueiredo Miller, Juan Rafael Orozco-Arroyave + Show 2 more

Open Access

https://doi.org/10.1016/j.csl.2023.101584

Copy DOI

Abstract

This paper considers a representation learning strategy to model speech signals from patients with Parkinson’s disease, with the goal of predicting the presence of the disease, and evaluating the level of degradation of a patient’s speech. In particular, we propose a novel fusion strategy that combines wideband and narrowband spectral resolutions using a representation learning strategy based on autoencoders, called the multi-spectral autoencoder. The proposed model is able to classify the speech from Parkinson’s disease patients with accuracy up to 97%. The proposed model is also able to assess the dysarthria severity of Parkinson’s disease patients with a Spearman correlation up to 0.79. These results outperform those observed in literature where the same problem was addressed with the same corpus.

Full Text