Automatic Database Segmentation using Hybrid Spectrum -Visual Approach

Manar Othman Gbaily

doi:10.21608/ejle.2021.89867.1024

Abstract

Nowadays automated segmentation of speech signals has been attracted many of researchers all-over the world, Many speech processing systems require segmentation of speech waveform into principal acoustic units. In this research, TIMIT DataBase (DB) is utilized to carry on this process and justify its operation or results. Thus, this paper presents a novel method of segmentation of speech phonemes, where the proposed strategy helps in the selection of appropriate feature extraction technique for speech segmentation. There are three main techniques of feature extraction used in our research; the first technique is the Mel Frequency Cepstral Coefficient (MFCC), the second technique is known by Best Tree Encoding (BTE), while the third is Image Normalized Encoder (INE), which is a hybrid technique between the Best Tree Image (BTI), and the Convolution Neural Network (CNN) ResNet-50. Then, data are trained using a hybrid model that consists of Hidden Markov Model (HMM), and Gaussian Mixture Model (GMM) to improve the performance of automatic speech recognition. The proposed model is tested and verified against the most widely used feature Mel Frequency Cepstral Coefficient (MFCC) plus delta and delta-delta coefficients (39 parameters) to evaluate its performance. This approach has the potential to be used in applications such as automatic speech recognition and automatic language identification. The experimental results show that BTE technique achieved the highest success rate (𝜂) (92.64%) than using the (INE) technique. However, the INE technique gives confusion success rate for Tr and NTr of values 97.1% and 99.1%, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic Database Segmentation using Hybrid Spectrum -Visual Approach

Abstract

Talk to us

Similar Papers

More From: The Egyptian Journal of Language Engineering

Lead the way for us

Journal: The Egyptian Journal of Language Engineering	Publication Date: Sep 1, 2021
License type: cc-by

Similar Papers

Integration of articulatory knowledge and voicing features based on DNN/HMM for Mandarin speech recognition
Ying-Wei Tan ... Wei Jiang
-
Ying-Wei Tan, et. al. Ying-Wei Tan ... Wei Jiang
01 Jul 2015
01 Jul 2015

Using DTW neural–based MFCC warping to improve emotional speech recognition
Mansour Sheikhan ... Davood Gharavian
Neural Computing and Applications | VOL. 21
Mansour Sheikhan, et. al.Mansour Sheikhan ... Davood Gharavian
15 May 2011
Neural Computing and Applications | VOL. 21

Novel speech processing techniques for robust automatic speech recognition

-

01 Jan 2006
01 Jan 2006

Power Spectrum Difference Teager Energy Features for Speech Recognition in Noisy Environment
N S Nehe ... R.S Holambe
-
N S Nehe, et. al.N S Nehe ... R.S Holambe
01 Dec 2008
01 Dec 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Database Segmentation using Hybrid Spectrum -Visual Approach

Abstract

Talk to us

Similar Papers

More From: The Egyptian Journal of Language Engineering