An Effective Learning Method for Automatic Speech Recognition in Korean CI Patients’ Speech

Jiho Jeong,S I M M Raton Mondol,Sangmin Lee,Yeon Wook Kim

doi:10.3390/electronics10070807

Jiho Jeong, S I M M Raton Mondol + Show 2 more

Open Access

https://doi.org/10.3390/electronics10070807

Copy DOI

Journal: Electronics	Publication Date: Mar 29, 2021
Citations: 2	License type: CC BY 4.0

Affiliation: Inha University

Abstract

The automatic speech recognition (ASR) model usually requires a large amount of training data to provide better results compared with the ASR models trained with a small amount of training data. It is difficult to apply the ASR model to non-standard speech such as that of cochlear implant (CI) patients, owing to privacy concerns or difficulty of access. In this paper, an effective finetuning and augmentation ASR model is proposed. Experiments compare the character error rate (CER) after training the ASR model with the basic and the proposed method. The proposed method achieved a CER of 36.03% on the CI patient’s speech test dataset using only 2 h and 30 min of training data, which is a 62% improvement over the basic method.

Highlights

Various automatic speech recognition (ASR) models have been proposed in recent years, including the recurrent neural network transducer (RNN-T) [1], Listen, Attend and
Speech understanding can be restored through cochlear implants in people with severe hearing loss, especially sensorineural hearing loss
Because the cochlear implant (CI) patient’s speech is already distorted, it can be confirmed that the augmentation that modifies the raw audio is not effective

Summary

Introduction

Various automatic speech recognition (ASR) models have been proposed in recent years, including the recurrent neural network transducer (RNN-T) [1], Listen, Attend and. ASR models are trained using standard speech data sets [4,5]. People with non-standard speech cannot use ASR models trained with a standard speech dataset. We experimented with an effective ASR method to increase the recognition rate of the non-standard CI patients’ speech. The biggest hindrance in learning a non-standard dataset is to find sufficient data to train an ASR model [8]. Adversarial training can be used to generate training data by transforming standard speech into nonstandard speech [9]. We used a data augmentation technique and selected the augmentation method [10,11] used for standard speech.

Materials and Methods

Base Model

Pre-Train Process

Finetuning

Dataset

Result

Method 3 Resul

Comparison of the Results of the Three Methods

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Effective Learning Method for Automatic Speech Recognition in Korean CI Patients’ Speech

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture
Gaofeng Cheng ... Haoran Miao
IEEE/ACM transactions on audio, speech, and language processing | VOL. 30
Gaofeng Cheng, et. al.Gaofeng Cheng ... Haoran Miao
01 Jan 2021
IEEE/ACM transactions on audio, speech, and language processing | VOL. 30

Robust ASR model adaptation by feature-based statistical data mapping
Xuechuan Wang ... Douglas O'Shaughnessy
-
Xuechuan Wang, et. al.Xuechuan Wang ... Douglas O'Shaughnessy
04 Oct 2004
04 Oct 2004

Bangla-Wave: Improving Bangla Automatic Speech Recognition Utilizing N-gram Language Models
Mohammed Rakib ... Nabeel Mohammed
-
Mohammed Rakib, et. al.Mohammed Rakib ... Nabeel Mohammed
23 Feb 2023
23 Feb 2023

Development and comparison of ASR models using kaldi for noisy and enhanced kannada speech data
G Thimmaraja Yadava ... H S Jayanna
-
G Thimmaraja Yadava, et. al.G Thimmaraja Yadava ... H S Jayanna
01 Sep 2017
01 Sep 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Effective Learning Method for Automatic Speech Recognition in Korean CI Patients’ Speech

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics