Novel Image PreprocessingApproach for Automatic Speech Recognition

Amr Gody,Youssra Emam,Nashaat Hussein

doi:10.21608/ejle.2018.60081

Amr Gody, Youssra Emam + Show 1 more

Open Access

https://doi.org/10.21608/ejle.2018.60081

Copy DOI

Abstract

This research is intending to provide a novel approach of manipulating automatic speech recognition using image recognition approach. This research introduces hybrid 2D-Image-Hidden Markov Model(2DI)-(HMM) approach to handle preprocessing classification task in Automatic Speech Recognition System (ASR). The focus in this research is in the classification task. Due to that the proposed approach is novel and is a task in the whole ASR, it is evaluated using relative comparison to other popular approaches to run the same task on the same database. The relative comparison with hybrid Gaussian Mixture (GMM)-HMM with Mel Frequency Cepstral (MFCC) features is considered as reference results. This research introduces a new method of mapping speech signal into two-dimensionalspace. Speech stream is segmented and then the frequency contents are projected into frequency domain using a balanced tree structure filter. The wavelet packets technique is used to implement the filtering. The tree structure is captured into image. Database is constructed of encoded images. The imagesthenare segregated into speech classes. Hybrid Discrete Cosine Transform (DCT) based featuresare used for image encoding with (HMM) as Class model is evaluated against MFCC-HMM for the same classification problem. The proposed hybrid model indicates better balanced results over MFCC-HMM for handling the different classes. The considered classes in this research are vowels, consonants, plosives and speech silence.KED-TIMITCorpus is used in this research as source of speech information. This approach is indicating promising results especiallyin Silence and vowels detection.

Highlights

Automatic speech Recognition (ASR) is the task to convert the speech utterance into a text script
This paper introduces novel approach for preprocessing task that is intending to enhance the overall automatic speech recognition
By applyingvector quantization on (BTEI), the result was significantly degraded from 70.9% to43.8%.But this is not the case whenapplying vector quantization on Mel Frequency Cepstral (MFCC)

Summary

INTRODUCTION

Automatic speech Recognition (ASR) is the task to convert the speech utterance into a text script. This research is intending to provide a preprocessing task to enhance the successor task of ASR. The research Goals in this research paper are: 1- To figure out the use of Two Dimension Image Encoded (2D image) approach in speech recognition. 2- To evaluate the proposed Hybrid model Two Dimension Image Encoded 2DI-HMM with respect to GMM-. HMM for handling the same preprocessing classification task

LiteratureReview

VectorQuantizationTechnique

Best Treeand Entropy

Machine Learning using HMM

TEST CASES AND EXPERIMENTAL MODEL

RESULTS

C V P INS SubTotal

CONCLUSIONS

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Novel Image PreprocessingApproach for Automatic Speech Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Egyptian Journal of Language Engineering

Lead the way for us

Journal: The Egyptian Journal of Language Engineering	Publication Date: Sep 11, 2018
License type: cc-by

Similar Papers

A unified system for multilingual speech recognition and language identification
Danyang Liu ... Yonghong Yan
Speech Communication | VOL. 127
Danyang Liu, et. al.Danyang Liu ... Yonghong Yan
26 Dec 2020
Speech Communication | VOL. 127

Using Auxiliary Sources of Knowledge for Automatic Speech Recognition

-

01 Jan 2004
01 Jan 2004

Non-Native Pronunciation Variation Modeling for Automatic Speech Recognition
Hong Kook ... Yoo Rhee
-
Hong Kook, et. al.Hong Kook ... Yoo Rhee
16 Aug 2010
16 Aug 2010

Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems
Kartik Audhkhasi ... Andreas M Zavou
IEEE/ACM transactions on audio, speech, and language processing | VOL. 22
Kartik Audhkhasi, et. al.Kartik Audhkhasi ... Andreas M Zavou
01 Mar 2014
IEEE/ACM transactions on audio, speech, and language processing | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Novel Image PreprocessingApproach for Automatic Speech Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Egyptian Journal of Language Engineering