Monotonic Gaussian regularization of attention for robust automatic speech recognition

Yeqian Du,Minghui Wu,Xin Fang,Zhouwang Yang

doi:10.1016/j.csl.2022.101405

Abstract

The Attention-based Encoder–Decoder (AED) models are one of the most popular models for Automatic Speech Recognition (ASR). However, instability can occur in AED with problems such as incorrect insertions or word repetitions due to the violation of the inherent monotonic alignment property. To address these problems, we propose a monotonic Gaussian regularization method to guide the attention training, where the guiding map is depicted as a sequence of Gaussian distributions with monotonically moving centers. Experiments show our method reduces the insertion error rate by a relative 7% on the HKUST dataset, relative 20% and 16% on two large industrial datasets, and a relative 21% on an out-of-domain test set. The overall Character Error Rates (CERs) are all reduced at the same time, indicating that the model’s recognition ability is well maintained. Therefore, our proposed method improves model performance by enhancing monotonic alignment, and provides better robustness.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Monotonic Gaussian regularization of attention for robust automatic speech recognition

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: May 30, 2022
Citations: 1

Similar Papers

An Effective Learning Method for Automatic Speech Recognition in Korean CI Patients’ Speech
Jiho Jeong ... Sangmin Lee
Electronics | VOL. 10
Jiho Jeong, et. al.Jiho Jeong ... Sangmin Lee
29 Mar 2021
Electronics | VOL. 10

Adversarial Attack and Defense for Commercial Black-box Chinese-English Speech Recognition Systems
Xuejing Yuan ... Xinqi Ling
ACM Transactions on Privacy and Security | VOL. -
Xuejing Yuan, et. al.Xuejing Yuan ... Xinqi Ling
07 Nov 2024
ACM Transactions on Privacy and Security | VOL. -

ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture
Gaofeng Cheng ... Haoran Miao
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30
Gaofeng Cheng, et. al.Gaofeng Cheng ... Haoran Miao
01 Jan 2021
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30

Semi-supervised End-to-end Speech Recognition Using Text-to-speech and Autoencoders
Shigeki Karita ... Shinji Watanabe
-
Shigeki Karita, et. al.Shigeki Karita ... Shinji Watanabe
01 May 2019
01 May 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Monotonic Gaussian regularization of attention for robust automatic speech recognition

Abstract

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language