Character-Aware Attention-Based End-to-End Speech Recognition

Zhong Meng,Yifan Gong,Jinyu Li,Yashesh Gaur

doi:10.1109/asru46091.2019.9004018

Abstract

Predicting words and subword units (WSUs) as the output has shown to be effective for the attention-based encoder-decoder (AED) model in end-to-end speech recognition. However, as one input to the decoder recurrent neural network (RNN), each WSU embedding is learned independently through context and acoustic information in a purely data-driven fashion. Little effort has been made to explicitly model the morphological relationships among WSUs. In this work, we propose a novel character-aware (CA) AED model in which each WSU embedding is computed by summarizing the embeddings of its constituent characters using a CA-RNN. This WSU-independent CA-RNN is jointly trained with the encoder, the decoder and the attention network of a conventional AED to predict WSUs. With CA-AED, the embeddings of morphologically similar WSUs are naturally and directly correlated through the CA-RNN in addition to the semantic and acoustic relations modeled by a traditional AED. Moreover, CA-AED significantly reduces the model parameters in a traditional AED by replacing the large pool of WSU embeddings with a much smaller set of character embeddings. On a 3400 hours Microsoft Cortana dataset, CA-AED achieves up to 11.9% relative WER improvement over a strong AED baseline with 27.1% fewer model parameters.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Character-Aware Attention-Based End-to-End Speech Recognition

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data
Ye Bai ... Jiangyan Yi
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 29
Ye Bai, et. al.Ye Bai ... Jiangyan Yi
01 Jan 2020
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 29

Speaker Adaptation for Attention-Based End-to-End Speech Recognition
Zhong Meng ... Yashesh Gaur
-
Zhong Meng, et. al.Zhong Meng ... Yashesh Gaur
15 Sep 2019
15 Sep 2019

OCR error correction for Vietnamese handwritten text using neural machine translation
D Q Nguyen ... P Kromer
-
D Q Nguyen, et. al.D Q Nguyen ... P Kromer
01 Jan 2020
01 Jan 2020

Self-Supervised Pre-Training for Attention-Based Encoder-Decoder ASR Model
Changfeng Gao ... Gaofeng Cheng
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30
Changfeng Gao, et. al.Changfeng Gao ... Gaofeng Cheng
01 Jan 2021
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Character-Aware Attention-Based End-to-End Speech Recognition

Abstract

Talk to us

Similar Papers