Comparative Evaluations of Various Factored Deep Convolutional Rnn Architectures for Noise Robust Speech Recognition

Masakiyo Fujimoto,Hisashi Kawai

doi:10.1109/icassp.2018.8462596

Abstract

In this paper, we present a factored network-based acoustic modeling framework with various deep convolutional recurrent neural network (RNN) architectures for noise-robust automatic speech recognition (ASR). As the factored network-based acoustic model, we have already proposed a deep convolutional neural network (CNN)-based framework. Deep CNNs can emphasize the spatial locality of input speech features, but have no ability to analyze the properties of long-term speech feature sequences. Therefore, we introduce various deep convolutional RNN architectures that achieve both spatial locality and long-term analysis into our proposed factored network-based acoustic modeling framework. Through various comparative evaluations, we reveal that the proposed method successfully improves the accuracy of ASR in noisy environments.

Full Text