Abstract

The automatic speech recognition (ASR) model usually requires a large amount of training data to provide better results compared with the ASR models trained with a small amount of training data. It is difficult to apply the ASR model to non-standard speech such as that of cochlear implant (CI) patients, owing to privacy concerns or difficulty of access. In this paper, an effective finetuning and augmentation ASR model is proposed. Experiments compare the character error rate (CER) after training the ASR model with the basic and the proposed method. The proposed method achieved a CER of 36.03% on the CI patient’s speech test dataset using only 2 h and 30 min of training data, which is a 62% improvement over the basic method.

Highlights

  • Various automatic speech recognition (ASR) models have been proposed in recent years, including the recurrent neural network transducer (RNN-T) [1], Listen, Attend and

  • Speech understanding can be restored through cochlear implants in people with severe hearing loss, especially sensorineural hearing loss

  • Because the cochlear implant (CI) patient’s speech is already distorted, it can be confirmed that the augmentation that modifies the raw audio is not effective

Read more

Summary

Introduction

Various automatic speech recognition (ASR) models have been proposed in recent years, including the recurrent neural network transducer (RNN-T) [1], Listen, Attend and. ASR models are trained using standard speech data sets [4,5]. People with non-standard speech cannot use ASR models trained with a standard speech dataset. We experimented with an effective ASR method to increase the recognition rate of the non-standard CI patients’ speech. The biggest hindrance in learning a non-standard dataset is to find sufficient data to train an ASR model [8]. Adversarial training can be used to generate training data by transforming standard speech into nonstandard speech [9]. We used a data augmentation technique and selected the augmentation method [10,11] used for standard speech.

Materials and Methods
Base Model
Pre-Train Process
Finetuning
Dataset
Result
Method 3 Resul
Comparison of the Results of the Three Methods
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call