Abstract

In this paper, we propose to leverage end-to-end automatic speech recognition (ASR) systems for assisting deep neural network-hidden Markov model (DNN-HMM) hybrid ASR systems. The DNN-HMM hybrid ASR system, which is composed of an acoustic model, a language model and a pronunciation model, is known to be the most practical architecture in ASR field. On the other hand, much attention has been paied in recent studies to the end-to-end ASR systems that are fully composed of neural networks. It is known that they can yield comparative performance without introducing heuristic operations. However, one problem is that the end-to-end ASR systems sometimes suffer from redundant generation and ommission of important words in text generation phases. This is because these systems cannot explicitly consider the connection between the input speech and the output text. Therefore, our idea is to regard the end-to-end ASR systems as neural speech-to-text language models (NS2TLMs) and to use them for rescoring hypotheses generated in the DNN-HMM hybrid ASR systems. This enables us to leverage the end-to-end ASR systems while avoiding the generation issues because the DNN-HMM hybrid ASR systems can generate speech-aligned hypotheses. It is expected that the NS2TLMs improve the DNN-HMM hybrid ASR systems because the end-to-end ASR systems correctly handle short-duration utterances. In our experiments, we use state-of-the-art DNN-HMM hybrid ASR systems with convolutional and long short-term memory recurrent neural network acoustic models and end-to-end ASR systems based on attetional encoder-decoder. We demonstrate that our proposed method can yield a better ASR performance than both the DNN-HMM hybrid ASR system and the end-to-end ASR system.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.