Abstract

For voice communication, it is important to extract the speech from its noisy version without introducing unnaturally artificial noise. By studying the subband mean-squared error (MSE) of the speech for unsupervised speech enhancement approaches and revealing its relationship with the existing loss function for supervised approaches, this paper derives a generalized loss function that takes residual noise control into account with a supervised approach. Our generalized loss function contains the well-known MSE loss function and many other often-used loss functions as special cases. Compared with traditional loss functions, our generalized loss function is more flexible to make a good trade-off between speech distortion and noise reduction. This is because a group of well-studied noise shaping schemes can be introduced to control residual noise for practical applications. Objective and subjective test results verify the importance of residual noise control for the supervised speech enhancement approach.

Highlights

  • Speech enhancement plays an important role in noisy environments for many applications, such as speech communication, speech interaction and speech translation

  • This paper uses four objective measurements to analyze the performance of proposed generalized loss, including noise attenuation (NA) [25], speech attenuation (SA) [25], perceptual evaluation speech quality (PESQ) [17], and SDR [39]

  • The increase of β 0 will decrease NA. This is because the residual noise control mechanism is introduced for optimization, which means, during the training process, the residual noise in the estimated spectra will gradually get close to the preset residual noise threshold

Read more

Summary

Introduction

Speech enhancement plays an important role in noisy environments for many applications, such as speech communication, speech interaction and speech translation. Conventional approaches include spectral subtraction [1], statistical method [2,3] and subspace-based method [4], which has proved to be valid when the additive noise is stationary or quasi-stationary Their performance often suffers from heavy degradation under non-stationary and low signal-to-noise ratio (SNR) conditions. Sci. 2020, 10, 2894 over-smoothing estimation which omits some important detailed information To solve these problems, many new criteria, that consider speech perception, have been proposed in recent years [12,13,14,15]. In [21], speech distortion and residual noise are considered separately in the loss function, known as the components loss (CL), which obtains relatively better metric scores than MSE when suitable loss-weighted coefficients are selected. We derive a generalized loss function by introducing multiple manual parameters to flexibly make a balance between speech distortion and noise attenuation.

Problem Formulation
Trade-Off Criterion in Subband
Trade-Off Criterion in Fullband
A Generalized Loss Function
Dataset
Experimental Settings
Network Architecture
Loss Functions and Training Models
Results and Analysis
The Impact of α
Subjective Evaluation
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.