Abstract
In this paper, we present a factored network-based acoustic modeling framework with various deep convolutional recurrent neural network (RNN) architectures for noise-robust automatic speech recognition (ASR). As the factored network-based acoustic model, we have already proposed a deep convolutional neural network (CNN)-based framework. Deep CNNs can emphasize the spatial locality of input speech features, but have no ability to analyze the properties of long-term speech feature sequences. Therefore, we introduce various deep convolutional RNN architectures that achieve both spatial locality and long-term analysis into our proposed factored network-based acoustic modeling framework. Through various comparative evaluations, we reveal that the proposed method successfully improves the accuracy of ASR in noisy environments.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have