Deep Group Residual Convolutional CTC Networks for Speech Recognition

Kai Wang,Bohan Li,Donghai Guan

doi:10.1007/978-3-030-05090-0_27

Abstract

End-to-end deep neural networks have been widely used in the literature to model 2D correlations in the audio signal. Both Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) have shown improvements across a wide variety of speech recognition tasks. Especially, CNNs effectively exploit temporal and spectral local correlations to gain translation invariance. However, all CNNs used in existing work assume each channel’s feature map is independent of each other, which may not fully utilize and combine information about input features. Meanwhile, most CNNs in literature use shallow layers may not be deep enough to capture all human speech signal information. In this paper, we propose a novel neural network, denoted as GRCNN-CTC, which integrates group residual convloutional blocks and recurrent layers paired with Connectionist Temporal Classification (CTC) loss. Experimental results show that our proposed GRCNN-CTC achieve 1.11% Word Error Rate (WER) and 0.48% Character Error Rate (CER) improvements on a subset of the LibriSpeech dataset compared to the baseline automatic speech recognition (ASR) system. In addition, our model greatly reduces computational overhead and converges faster, leading to scale up to deeper architecture.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep Group Residual Convolutional CTC Networks for Speech Recognition

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

End-to-End Low-Resource Speech Recognition with a Deep CNN-LSTM Encoder
Weizhe Wang ... Xiaodong Yang
-
Weizhe Wang, et. al.Weizhe Wang ... Xiaodong Yang
01 Sep 2020
01 Sep 2020

CNN-Self-Attention-DNN Architecture For Mandarin Recognition
Chengtao Cai ... Dongning Guo
-
Chengtao Cai, et. al.Chengtao Cai ... Dongning Guo
01 Aug 2020
01 Aug 2020

Chapter 2 - End-to-End Acoustic Modeling Using Convolutional Neural Networks
Vishal Passricha ... Rajesh Kumar Aggarwal
Intelligent Speech Signal Processing | VOL. -
Vishal Passricha, et. al.Vishal Passricha ... Rajesh Kumar Aggarwal
01 Jan 2019
Intelligent Speech Signal Processing | VOL. -

Bottleneck and Embedding Representation of Speech for DNN-based Language and Speaker Recognition
Alicia Lozano-Diez ... Joaquin Gonzalez-Rodriguez
-
Alicia Lozano-Diez, et. al.Alicia Lozano-Diez ... Joaquin Gonzalez-Rodriguez
21 Nov 2018
21 Nov 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Group Residual Convolutional CTC Networks for Speech Recognition

Abstract

Talk to us

Similar Papers