Accented Speech Recognition Based on End-to-End Domain Adversarial Training of Neural Networks

Hyeong-Ju Na,Jeong-Sik Park

doi:10.3390/app11188412

Hyeong-Ju Na, Jeong-Sik Park

Open Access

PDF Available

https://doi.org/10.3390/app11188412

Copy DOI

Export

Save

Cite

Journal: Applied Sciences	Publication Date: Sep 10, 2021
Citations: 13	License type: CC BY 4.0

Affiliation: Hankuk University of Foreign Studies

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

The performance of automatic speech recognition (ASR) may be degraded when accented speech is recognized because the speech has some linguistic differences from standard speech. Conventional accented speech recognition studies have utilized the accent embedding method, in which the accent embedding features are directly fed into the ASR network. Although the method improves the performance of accented speech recognition, it has some restrictions, such as increasing the computational costs. This study proposes an efficient method of training the ASR model for accented speech in a domain adversarial way based on the Domain Adversarial Neural Network (DANN). The DANN plays a role as a domain adaptation in which the training data and test data have different distributions. Thus, our approach is expected to construct a reliable ASR model for accented speech by reducing the distribution differences between accented speech and standard speech. DANN has three sub-networks: the feature extractor, the domain classifier, and the label predictor. To adjust the DANN for accented speech recognition, we constructed these three sub-networks independently, considering the characteristics of accented speech. In particular, we used an end-to-end framework based on Connectionist Temporal Classification (CTC) to develop the label predictor, a very important module that directly affects ASR results. To verify the efficiency of the proposed approach, we conducted several experiments of accented speech recognition for four English accents including Australian, Canadian, British (England), and Indian accents. The experimental results showed that the proposed DANN-based model outperformed the baseline model for all accents, indicating that the end-to-end domain adversarial training effectively reduced the distribution differences between accented speech and standard speech.

Highlights

US was determined as the source domain, while the other four accents were regarded as the target domains, because the quantity of US data is much larger than that of other accents
This study proposed an efficient accented speech recognition approach using end-toend domain adversarial training of neural networks based on Domain Adversarial Neural Network (DANN)
We proposed an efficient DANN model architecture to carefully handle accented speech recognition

Summary

Introduction

Many studies have proposed some methods to improve the performance of accented speech recognition. The initial approaches for accented speech recognition were focused on adaptations from standard speech to accented speech. Neural network-based approaches have been widely used for accented speech recognition. This study applied the Domain Adversarial Neural Network (DANN) as a domain adaptation technique for accented speech recognition. It has been widely used for computer vision studies [8]. This study proposed an end-to-end domain adversarial training framework for accented speech recognition.

Conventional Studies on Accented Speech Recognition

Domain Adaptation for Accented Speech Recognition

Domain Adversarial Neural Network

End-to-End

Feature Extractor

Domain Classifier

Label Predictor

Speech Corpus

Hyperparameters

Results

Architecture

Conclusions

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Accented Speech Recognition Based on End-to-End Domain Adversarial Training of Neural Networks

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Investigation of Automatic Speech Recognition Performance and Mean Opinion Scores for Different Standard Speech and Audio Codecs
A V Ramana ... Mythili Sharan Pala
IETE Journal of Research | VOL. 58
A V Ramana, et. al.A V Ramana ... Mythili Sharan Pala
01 Mar 2012
IETE Journal of Research | VOL. 58

Combined speech enhancement and auditory modelling for robust distributed speech recognition
Ronan Flynn ... Edward Jones
Speech Communication | VOL. 50
Ronan Flynn, et. al.Ronan Flynn ... Edward Jones
20 May 2008
Speech Communication | VOL. 50

Estimation of speech recognition performance in noisy and reverberant environments using PESQ score and acoustic parameters
Takahiro Fukumori ... Takanobu Nishiura
-
Takahiro Fukumori, et. al.Takahiro Fukumori ... Takanobu Nishiura
01 Oct 2013
01 Oct 2013

Adaptive Attention Network with Domain Adversarial Training for Multi-Accent Speech Recognition
Yanbing Yang ... Qingzhi Hou
-
Yanbing Yang, et. al.Yanbing Yang ... Qingzhi Hou
11 Dec 2022
11 Dec 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Accented Speech Recognition Based on End-to-End Domain Adversarial Training of Neural Networks

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Applied Sciences