Stereo-based histogram equalization for robust speech recognition

Randa Al-Wakeel,Sherif Abdou,Magdy Aboul-Ela,Mahmoud Shoman

doi:10.1186/s13636-015-0059-4

Abstract

Optimal automatic speech recognition (ASR) takes place when the recognition system is tested under circumstances identical to those in which it was trained. However, in the actual real world, there exist many sources of mismatches between the environment of training and the environment of testing. These sources can be due to the sources of noise that exist in real environments. Speech enhancement techniques have been developed to provide ASR systems with the robustness against the sources of noise. In this work, a method based on histogram equalization (HEQ) was proposed to compensate for the nonlinear distortions in speech representation. This approach utilizes stereo simultaneous recordings for clean speech and its corresponding noisy speech to compute stereo Gaussian mixture model (GMM). The stereo GMM is used to compute the cumulative density function (CDF) for both clean speech and noisy speech using a sigmoid function instead of using the order statistics that is used in other HEQ-based methods. In the implementation, we show two choices to apply HEQ, hard decision HEQ and soft decision HEQ. The latter is based on minimum mean square error (MMSE) clean speech estimation. The experimental work shows that the soft HEQ and hard HEQ achieve better recognition results than the other HEQ approaches such as tabular HEQ, quantile HEQ and polynomial fit HEQ. It also shows that soft HEQ achieves notably better recognition results than hard HEQ. The results of the experimental work also show that using HEQ improves the efficiency of other speech enhancement techniques such as stereo piece-wise linear compensation for environment (SPLICE) and vector Taylor series (VTS). The results also show that using HEQ in multi style training (MST) significantly improves the ASR system performance.

Highlights

Optimal automatic speech recognition (ASR) takes place when the recognition system is used under circumstances identical to those in which it was trained
6 Conclusions In this paper, we proposed a speech enhancement-method based on histogram equalization (HEQ)
HEQ attempts to eliminate the nonlinear distortions of noise by transforming the probability density function (PDF) of the original noisy feature into its reference training PDF to improve the recognition performance

Summary

Introduction

Optimal automatic speech recognition (ASR) takes place when the recognition system is used under circumstances identical to those in which it was trained. This method depends on the availability of stereo recordings for the training clean speech and its corresponding noisy speech. The estimated clean speech coefficient can be obtained by applying the inverse of the reference cumulative density function on the noisy CDF: x^ 1⁄4 This process is assumed to transform the test data distribution into the training data distribution. The stereo database is used to train a stereo GMM by concatenating each clean speech frame together with the corresponding noisy speech feature vector Another difference is that cumulative density function tables for both clean and noisy speech are computed using the sigmoid function that utilizes the stereo GMM, so the order statistics is not used to compute the test CDF.

Applying HEQ to the test speech

Findings

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Audio, Speech, and Music Processing	Publication Date: Jun 9, 2015
Citations: 21	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Stereo-based histogram equalization for robust speech recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

An Improved VTS Feature Compensation using Mixture Models of Distortion and IVN Training for Noisy Speech Recognition
Jun Du ... Qiang Huo
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Jun Du, et. al.Jun Du ... Qiang Huo
01 Nov 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Deep Learning for Minimum Mean-Square Error and Missing Data Approaches to Robust Speech Processing

-

04 Dec 2020
04 Dec 2020

Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling
G Thimmaraja Yadava ... H S Jayanna
International Journal of Speech Technology | VOL. 23
G Thimmaraja Yadava, et. al.G Thimmaraja Yadava ... H S Jayanna
22 Jan 2020
International Journal of Speech Technology | VOL. 23

A VTS-based Feature Compensation Method using Noisy Speech HMMs
Yongjoo Chung
Applied Mathematics & Information Sciences | VOL. 8
Yongjoo ChungYongjoo Chung
01 Nov 2014
Applied Mathematics & Information Sciences | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Stereo-based histogram equalization for robust speech recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing