Abstract

In this paper, we present a factored network-based acoustic modeling framework with various deep convolutional recurrent neural network (RNN) architectures for noise-robust automatic speech recognition (ASR). As the factored network-based acoustic model, we have already proposed a deep convolutional neural network (CNN)-based framework. Deep CNNs can emphasize the spatial locality of input speech features, but have no ability to analyze the properties of long-term speech feature sequences. Therefore, we introduce various deep convolutional RNN architectures that achieve both spatial locality and long-term analysis into our proposed factored network-based acoustic modeling framework. Through various comparative evaluations, we reveal that the proposed method successfully improves the accuracy of ASR in noisy environments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call