Accent modification for speech recognition of non-native speakers using neural style transfer

Kacper Radzikowski,Le Wang,Robert Nowak,Osamu Yoshie

doi:10.1186/s13636-021-00199-3

Abstract

Nowadays automatic speech recognition (ASR) systems can achieve higher and higher accuracy rates depending on the methodology applied and datasets used. The rate decreases significantly when the ASR system is being used with a non-native speaker of the language to be recognized. The main reason for this is specific pronunciation and accent features related to the mother tongue of that speaker, which influence the pronunciation. At the same time, an extremely limited volume of labeled non-native speech datasets makes it difficult to train, from the ground up, sufficiently accurate ASR systems for non-native speakers.In this research, we address the problem and its influence on the accuracy of ASR systems, using the style transfer methodology. We designed a pipeline for modifying the speech of a non-native speaker so that it more closely resembles the native speech. This paper covers experiments for accent modification using different setups and different approaches, including neural style transfer and autoencoder. The experiments were conducted on English language pronounced by Japanese speakers (UME-ERJ dataset). The results show that there is a significant relative improvement in terms of the speech recognition accuracy. Our methodology reduces the necessity of training new algorithms for non-native speech (thus overcoming the obstacle related to the data scarcity) and can be used as a wrapper for any existing ASR system. The modification can be performed in real time, before a sample is passed into the speech recognition system itself.

Highlights

Automatic speech recognition is a function that has been the subject of extensive research for decades
The LibriSpeech test-clean dataset was used for evaluating the Time Delay Neural Network (TDNN)-based automatic speech recognition (ASR) network we trained, and the result achieved in our test was 10% Character Error Rate (CER) and 12.5% Word Error Rate (WER)
5 Conclusions In this research, we explained the problem of non-native speech recognition and the reason why training ASR systems adapted for such speech may be problematic

Summary

Introduction

Automatic speech recognition is a function that has been the subject of extensive research for decades. Developed speech recognition tools can recognize speech with an almost human-like accuracy, depending on the dataset and benchmark test used [1]. Such performance can be achieved only when the system is used for recognizing the speech of native speakers (i.e., the native speakers of the language represented by the dataset used to train the ASR system). The main reason for this drop is the presence of patterns related to the speaker’s mother tongue which can influence the pronunciation of the second language. This makes their language biased to some extent which causes

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Audio, Speech, and Music Processing	Publication Date: Feb 18, 2021
Citations: 19	License type: open-access

R Discovery Prime

R Discovery Prime

Accent modification for speech recognition of non-native speakers using neural style transfer

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

Support software for Automatic Speech Recognition systems targeted for non-native speech
Kacper Radzikowski ... Robert Nowak
-
Kacper Radzikowski, et. al.Kacper Radzikowski ... Robert Nowak
30 Nov 2020
30 Nov 2020

Accent neutralization for speech recognition of non-native speakers
Kacper Radzikowski ... Le Wang
-
Kacper Radzikowski, et. al.Kacper Radzikowski ... Le Wang
02 Dec 2019
02 Dec 2019

Non-Native Pronunciation Variation Modeling for Automatic Speech Recognition
Hong Kook ... Mina Kim
-
Hong Kook, et. al.Hong Kook ... Mina Kim
16 Aug 2010
16 Aug 2010

Dual supervised learning for non-native speech recognition
Kacper Radzikowski ... Osamu Yoshie
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2019
Kacper Radzikowski, et. al.Kacper Radzikowski ... Osamu Yoshie
14 Jan 2019
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accent modification for speech recognition of non-native speakers using neural style transfer

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing