Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition

Lester Phillip Violeta,Wen Chin Huang,Tomoki Toda

doi:10.21437/interspeech.2022-10043

Abstract

We investigate the performance of self-supervised pretraining frameworks on pathological speech datasets used for automatic speech recognition (ASR). Modern end-to-end models require thousands of hours of data to train well, but only a small number of pathological speech datasets are publicly available. A proven solution to this problem is by first pretraining the model on a huge number of healthy speech datasets and then fine-tuning it on the pathological speech datasets. One new pretraining framework called self-supervised learning (SSL) trains a network using only speech data, providing more flexibility in training data requirements and allowing more speech data to be used in pretraining. We investigate SSL frameworks such as the wav2vec 2.0 and WavLM models using different setups and compare their performance with different supervised pretraining setups, using two types of pathological speech, namely, Japanese electrolaryngeal and English dysarthric. Our results show that although SSL has shown success with minimally resourced healthy speech, we do not find this to be the case with pathological speech. The best supervised setup outperforms the best SSL setup by 13.9% character error rate in electrolaryngeal speech and 16.8% word error rate in dysarthric speech.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Mandarin Electrolaryngeal Speech Recognition Based on WaveNet-CTC.
Zhaopeng Qian ... Shaochuan Zhang
Journal of Speech, Language, and Hearing Research | VOL. 62
Zhaopeng Qian, et. al.Zhaopeng Qian ... Shaochuan Zhang
14 Jun 2019
Journal of Speech, Language, and Hearing Research | VOL. 62

Accurate synthesis of dysarthric Speech for ASR data augmentation
Mohammad Soleymanpour ... Jeffrey Berry
Speech Communication | VOL. 164
Mohammad Soleymanpour, et. al.Mohammad Soleymanpour ... Jeffrey Berry
10 Aug 2024
Speech Communication | VOL. 164

Improving Dysarthric Speech Segmentation With Emulated and Synthetic Augmentation.
Saeid Alavi Naeini ... Yana Yunusova
IEEE Journal of Translational Engineering in Health and Medicine | VOL. 12
Saeid Alavi Naeini, et. al.Saeid Alavi Naeini ... Yana Yunusova
01 Jan 2024
IEEE Journal of Translational Engineering in Health and Medicine | VOL. 12

End-to-end automated speech recognition using a character based small scale transformer architecture
Alexander Loubser ... Allan De Freitas
Expert Systems With Applications | VOL. 252
Alexander Loubser, et. al.Alexander Loubser ... Allan De Freitas
01 May 2024
Expert Systems With Applications | VOL. 252

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition

Abstract

Talk to us

Similar Papers