Improved End-to-End Speech Recognition Using Adaptive Per-Dimensional Learning Rate Methods

Xuyang Wang,Pengyuan Zhang,Yonghong Yan,Jielin Pan,Qingwei Zhao

doi:10.1587/transinf.2016sll0001

Abstract

The introduction of deep neural networks (DNNs) leads to a significant improvement of the automatic speech recognition (ASR) performance. However, the whole ASR system remains sophisticated due to the dependent on the hidden Markov model (HMM). Recently, a new end-to-end ASR framework, which utilizes recurrent neural networks (RNNs) to directly model context-independent targets with connectionist temporal classification (CTC) objective function, is proposed and achieves comparable results with the hybrid HMM/DNN system. In this paper, we investigate per-dimensional learning rate methods, ADAGRAD and ADADELTA included, to improve the recognition of the end-to-end system, based on the fact that the blank symbol used in CTC technique dominates the output and these methods give frequent features small learning rates. Experiment results show that more than 4% relative reduction of word error rate (WER) as well as 5% absolute improvement of label accuracy on the training set are achieved when using ADADELTA, and fewer epochs of training are needed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEICE Transactions on Information and Systems	Publication Date: Jan 1, 2016
Citations: 4	License type: free

R Discovery Prime

R Discovery Prime

Improved End-to-End Speech Recognition Using Adaptive Per-Dimensional Learning Rate Methods

Abstract

Talk to us

Similar Papers

More From: IEICE Transactions on Information and Systems

Lead the way for us

Similar Papers

Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems
Kartik Audhkhasi ... Shrikanth S Narayanan
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Kartik Audhkhasi, et. al.Kartik Audhkhasi ... Shrikanth S Narayanan
01 Mar 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Improving Hybrid CTC/Attention Architecture with Time-Restricted Self-Attention CTC for End-to-End Speech Recognition
Long Wu ... Ta Li
Applied Sciences | VOL. 9
Long Wu, et. al.Long Wu ... Ta Li
31 Oct 2019
Applied Sciences | VOL. 9

Advancing Acoustic-to-Word CTC Model With Attention and Mixed-Units
Amit Das ... Yifan Gong
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 27
Amit Das, et. al.Amit Das ... Yifan Gong
04 Sep 2019
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 27

A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks
Jun Du ... Yanhui Tu
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 24
Jun Du, et. al.Jun Du ... Yanhui Tu
01 Aug 2016
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improved End-to-End Speech Recognition Using Adaptive Per-Dimensional Learning Rate Methods

Abstract

Talk to us

Similar Papers

More From: IEICE Transactions on Information and Systems