Comparison of Hidden Markov Model and Recurrent Neural Network in Automatic Speech Recognition

Akshay Madhav Deshmukh

doi:10.24018/ejers.2020.5.8.2077

Abstract

Understanding human speech precisely by a machine has been a major challenge for many years.With Automatic Speech Recognition (ASR) being decades old and considering the advancement of the technology, where it is not at the point where machines understand all speech, it is used on a regular basis in many applications and services. Hence, to advance research it is important to identify significant research directions, specifically to those that have not been pursued or funded in the past. The performance of such ASR systems, traditionally build upon an Hidden Markov Model (HMM), has improved due tothe application of Deep Neural Networks (DNNs). Despite this progress, building an ASR system remained a challenging task requiring multiple resources and training stages. The idea of using DNNs for Automatic Speech Recognition has gone further from being a single component in a pipeline to building a system mainly based on such a network.This paper provides a literature survey on state of the art researches on two major models, namely Deep Neural Network - Hidden Markov Model (DNN-HMM) and Recurrent Neural Networks trained with Connectionist Temporal Classification (RNN-CTC). It also provides the differences between these two models at the architectural level.

Highlights

T HE technology of Automatic Speech recognition (ASR) concedes a system to recognize human speech and produce the output
This paper provides a literature survey on state of the art researches on two major models, namely Deep Neural Network - Hidden Markov Model (DNN-HMM) and Recurrent Neural Networks trained with Connectionist Temporal Classification (RNN-CTC)
While there is a small advancement of the baseline system from ”14 Hr” to the ”81 Hr” training set, there is a huge decline in the error rate of the RNN

Summary

INTRODUCTION

T HE technology of Automatic Speech recognition (ASR) concedes a system to recognize human speech and produce the output. The system produces a speech waveform which epitomizes the words of the sentence as well as the vocalized pauses in the input in the form as speech. The system decodes the speech providing the best fit of the sentence. It converts the speech signal into a sequence of vectors that is measured at the duration of the speech signal. The ASR systems that is available do not need a long period of speech training and can successfully recognize uninterrupted speech with large set of vocabulary with high accuracy rate.

Infrastructure

Models and Algorithms

Metadata

HIDDEN MARKOV MODEL SYSTEMS

Deep Neural Network Hybrids

Training of a DNN-HMM Based System

RECURRENT NEURAL NETWORK SYSTEMS

Recurrent Neural Networks

Decoding

Decoding with a Language Model

EXPERIMENTS

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: European Journal of Engineering Research and Science	Publication Date: Aug 31, 2020
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Comparison of Hidden Markov Model and Recurrent Neural Network in Automatic Speech Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: European Journal of Engineering Research and Science

Lead the way for us

Similar Papers

Comparison of Hidden Markov Model and Recurrent Neural Network in Automatic Speech Recognition
Akshay Madhav Deshmukh
European Journal of Engineering and Technology Research | VOL. 5
Akshay Madhav DeshmukhAkshay Madhav Deshmukh
31 Aug 2020
European Journal of Engineering and Technology Research | VOL. 5

Cross-lingual adaptation of a CTC-based multilingual acoustic model
Hervé Bourlard ... Philip N Garner
Speech Communication | VOL. 104
Hervé Bourlard, et. al.Hervé Bourlard ... Philip N Garner
04 Sep 2018
Speech Communication | VOL. 104

EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding
Yajie Miao ... Florian Metze
-
Yajie Miao, et. al.Yajie Miao ... Florian Metze
01 Dec 2015
01 Dec 2015

Neural Speech-to-Text Language Models for Rescoring Hypotheses of DNN-HMM Hybrid Automatic Speech Recognition Systems
Ryo Masumura ... Tomohiro Tanaka
-
Ryo Masumura, et. al.Ryo Masumura ... Tomohiro Tanaka
01 Nov 2018
01 Nov 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparison of Hidden Markov Model and Recurrent Neural Network in Automatic Speech Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: European Journal of Engineering Research and Science