Abstract

This work aims at developing an adaptive wavelet thresholding algorithm for speech enhancement with significant performance improvement over other wavelet-based counterparts. This is accomplished through the formulation of the optimum threshold for noise reduction, based on the generalized Gaussian priors to fully characterize the statistics of speech and noise wavelet coefficients. In addition, through the frame-wise context modeling which enables tracking of the statistical characteristics of each individual coefficient on the frame-wise basis, the optimum threshold is accurate and adaptive at both the coefficient level and frame level. The frame-wise context model is formulated by virtue of the context subspace projection of the wavelet coefficients, with the context index employed as the invariant correspondence between successive frame parameters, thereby enabling the frame-wise tracking at the coefficient level. Simulation results show significant improvement over the wavelet-based speech enhancement algorithms in terms of the segmental signal-to-noise ratio improvement by as much as 226%, the perceptual evaluation of speech quality by 36%, the short-time objective intelligibility by 17.8% and the cepstral distance by 33.3%. When benchmarked with the well-established short-time-Fourier-transform-based counterparts, the proposed wavelet thresholding algorithm offers favorable and more robust performances, particularly under non-stationary noise conditions, with no adverse musical noise effect.

Highlights

  • Besides its typical applications in teleconferencing, hands-free communications, hearing devices, etc., signal processing for speech enhancement (SE) has recently witnessed increasing demand in emerging applications

  • We propose an adaptive wavelet thresholding algorithm with the generalized Gaussian (GG) priors and frame-wise context modeling for general purpose speech enhancement, with emphasis on emerging voice-control applications

  • The proposed adaptive wavelet thresholding for speech enhancement based on the GG priors and frame-wise context modelling has been demonstrated to provide considerable improvement over other wavelet transform (WT)-based algorithms

Read more

Summary

INTRODUCTION

Besides its typical applications in teleconferencing, hands-free communications, hearing devices, etc., signal processing for speech enhancement (SE) has recently witnessed increasing demand in emerging applications. The WT makes no use of windowing, which inevitably entails bias-variance trade-off in spectral estimation, causing possible generation of musical noise as experienced in the STFT domain [39] Another benefit of the WT is its simplicity in processing real values typically associated with speech wavelet coefficients instead of complex values in the STFT domain. Instead of employing the Gaussian priors in [36] and [42], the student’s t-distribution prior in [37] and the Rayleigh prior in [38] for the derivation of the optimum threshold, the use of the GG priors in the proposed algorithm fully represents the statistical characteristics of speech and noise wavelet coefficients, and results in more accurate optimum threshold, over various non-stationary noise conditions.

OVERVIEW OF PROPOSED ADAPTIVE WAVELET THRESHOLDING
CONTEXT CLUSTERING
FRAME-WISE CONTEXT UPDATE OF NOISE
SIMULATION AND PERFORMANCE EVALUATION
SIMULATION RESULTS
Findings
CONCLUSION AND PROSPECTS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call