Abstract

This paper presents a novel data adaptive thresholding approach to single channel speech enhancement. The noisy speech signal and fractional Gaussian noise (fGn) are combined to produce the complex signal. The fGn is generated using the noise variance roughly estimated from the noisy speech signal. Bivariate empirical mode decomposition (bEMD) is employed to decompose the complex signal into a finite number of complex-valued intrinsic mode functions (IMFs). The real and imaginary parts of the IMFs represent the IMFs of observed speech and fGn, respectively. Each IMF is divided into short time frames for local processing. The variance of IMF of fGn calculated within a frame is used as the reference term to classify corresponding noisy speech frame into noise and signal dominant frames. Only the noise dominant frames are soft-thresholded to reduce the noise effects. Then, all the frames as well as IMFs of speech are combined, yielding the enhanced speech signal. The experimental results show the improved performance of the proposed algorithm compared to the recently reported methods.

Highlights

  • The research on speech enhancement is motivated by the rapidly growing market of speech communication applications, such as teleconferencing, hands-free telephony, hearing-aids, and speech recognition

  • The human auditory system is remarkably robust in most adverse situations, noise effects heavily affect the performance of automatic speech recognition (ASR) systems

  • Its main drawback is to find the speechless part to determine the noise variance. The performance of this method depends on the efficiency of voice activity detection (VAD), and it is not convenient to implement for practical applications

Read more

Summary

Introduction

The research on speech enhancement is motivated by the rapidly growing market of speech communication applications, such as teleconferencing, hands-free telephony, hearing-aids, and speech recognition. Its basic requirement is the noise spectrum which is determined from the nonspeech segments [3] In such single channel speech enhancement system, the residual noise is a usual issue. Instead of the speech signal, the variance of each IMF is used to determine the adaptive threshold, and better performance is achieved in [5]. Its main drawback is to find the speechless part to determine the noise variance The performance of this method depends on the efficiency of voice activity detection (VAD), and it is not convenient to implement for practical applications. This paper is organized as follows: the application of bEMD on speech and noise signals are described, the noise variance estimation process is explained, the proposed speech enhancement method using bEMD is described, experimental results are illustrated, and Section 6 contains some concluding remarks This paper is organized as follows: the application of bEMD on speech and noise signals are described in Section 2, the noise variance estimation process is explained in Section 3, the proposed speech enhancement method using bEMD is described in Section 4, experimental results are illustrated in Section 5, and Section 6 contains some concluding remarks

BEMD of Speech and Reference Signals
Estimation of Noise Variance
Speech Enhancement Method
Experimental Results and Discussions
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.