A major drawback of many speech enhancement methods in speech applications is the generation of an annoying residual noise with musical character. Although the Wiener filter introduces less musical noise than spectral subtraction methods, such noise, however, exists and is perceptually annoying to the listener. A potential solution to this artifact is the incorporation of a psychoacoustic model in the suppression filter design. In this paper a frequency domain optimal linear estimator with perceptual post-filtering is proposed, which incorporates the masking properties of the human hearing system to render the residual noise distortion inaudible. Proposed post-processing presents a modified way to measure the tonality coefficient and relative threshold offset for an optimal estimation of the noise masking threshold. The performance of the proposed enhancement algorithm is evaluated by the segmental SNR, Modified Bark Spectral Distortion (MBSD) and Perceptual Evaluation of Speech Quality (PESQ) measures under various noisy environments and yields better results compared to the Wiener filtering based on Ephraim-Malah's decision-directed approach.
Read full abstract