Enhanced Automatic Speech Recognition System Based on Enhancing Power-Normalized Cepstral Coefficients

Mohamed Tamazin,Mohamed Khedr,Ahmed Gouda

doi:10.3390/app9102166

Abstract

Many new consumer applications are based on the use of automatic speech recognition (ASR) systems, such as voice command interfaces, speech-to-text applications, and data entry processes. Although ASR systems have remarkably improved in recent decades, the speech recognition system performance still significantly degrades in the presence of noisy environments. Developing a robust ASR system that can work in real-world noise and other acoustic distorting conditions is an attractive research topic. Many advanced algorithms have been developed in the literature to deal with this problem; most of these algorithms are based on modeling the behavior of the human auditory system with perceived noisy speech. In this research, the power-normalized cepstral coefficient (PNCC) system is modified to increase robustness against the different types of environmental noises, where a new technique based on gammatone channel filtering combined with channel bias minimization is used to suppress the noise effects. The TIDIGITS database is utilized to evaluate the performance of the proposed system in comparison to the state-of-the-art techniques in the presence of additive white Gaussian noise (AWGN) and seven different types of environmental noises. In this research, one word is recognized from a set containing 11 possibilities only. The experimental results showed that the proposed method provides significant improvements in the recognition accuracy at low signal to noise ratios (SNR). In the case of subway noise at SNR = 5 dB, the proposed method outperforms the mel-frequency cepstral coefficient (MFCC) and relative spectral (RASTA)–perceptual linear predictive (PLP) methods by 55% and 47%, respectively. Moreover, the recognition rate of the proposed method is higher than the gammatone frequency cepstral coefficient (GFCC) and PNCC methods in the case of car noise. It is enhanced by 40% in comparison to the GFCC method at SNR 0dB, while it is improved by 20% in comparison to the PNCC method at SNR −5dB.

Highlights

Despite advanced signal processing techniques used nowadays, the existing automatic speech recognition (ASR) system still cannot meet the performance of the human auditory system
It is enhanced by 40% in comparison to the gammatone frequency cepstral coefficient (GFCC) method at signal to noise ratios (SNR) 0dB, while it is improved by 20% in comparison to the power-normalized cepstral coefficient (PNCC) method at SNR −5dB
The these systems for each noise is shown in the case of clean data and at six SNR levels from −5 dB to performance of these systems for each noise is shown in the case of clean data and at six SNR levels

Summary

Introduction

Despite advanced signal processing techniques used nowadays, the existing automatic speech recognition (ASR) system still cannot meet the performance of the human auditory system. This motivated many researchers to develop several robust feature extraction techniques. There are numerous approaches have been proposed to address these problems They are mainly categorized into two significant approaches [1]. The model–space approach achieves higher accuracy in comparison to the feature–space approach, it still requires a higher computational time. This paper is organized as follows: Section 2 discusses the proposed system in detail; Section 3 shows the experimental work and results; and Section 4 summarizes the outcomes of the paper and future works

Proposed Enhanced PNCC Algorithm

Normalized

Channel Bias Minimizing

Mean Power Normalization

Power Function Nonlinearity

Final Processing

Experimental

Results and Discussion

Computational Complexity

Conclusions and Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: May 27, 2019
Citations: 23	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Enhanced Automatic Speech Recognition System Based on Enhancing Power-Normalized Cepstral Coefficients

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Performance Analysis of various Front-end and Back End Amalgamations for Noise-robust DNN-based ASR
Mohit Dua ... Vinam Agrawal
Recent Advances in Computer Science and Communications | VOL. 14
Mohit Dua, et. al.Mohit Dua ... Vinam Agrawal
01 Dec 2021
Recent Advances in Computer Science and Communications | VOL. 14

Performance evaluation of Hindi speech recognition system using optimized filterbanks
Mohit Dua ... Mantosh Biswas
Engineering Science and Technology, an International Journal | VOL. 21
Mohit Dua, et. al.Mohit Dua ... Mantosh Biswas
16 Apr 2018
Engineering Science and Technology, an International Journal | VOL. 21

Optimizing Integrated Features for Hindi Automatic Speech Recognition System
Mohit Dua ... Mantosh Biswas
Journal of Intelligent Systems | VOL. 29
Mohit Dua, et. al.Mohit Dua ... Mantosh Biswas
01 Oct 2018
Journal of Intelligent Systems | VOL. 29

Discriminative Training Using Noise Robust Integrated Features and Refined HMM Modeling
Mohit Dua ... Mantosh Biswas
Journal of Intelligent Systems | VOL. 29
Mohit Dua, et. al.Mohit Dua ... Mantosh Biswas
20 Feb 2018
Journal of Intelligent Systems | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhanced Automatic Speech Recognition System Based on Enhancing Power-Normalized Cepstral Coefficients

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences