Abstract

Deep neural network based supervised approaches have shown great success in speech enhancement. Among these approaches, time–frequency domain methods have shown considerable potential and their loss functions have been evolved from sorely magnitude constraint to complex spectrum optimization in recent years. More recently, loss functions that combine magnitude constraint and complex spectrum optimization have been widely utilized to further improve speech quality and intelligibility. Moreover, power compression has also been introduced to those loss functions to further improve the estimated speech quality. Although the effectiveness of these loss functions has been witnessed, their properties have not been analyzed thoroughly and rigorously. This study gives a deep insight into these loss functions, revealing that combining the magnitude constraint and the complex spectrum optimization as well as the power compression can be generally regarded as a trade-off between phase recovery and magnitude estimation. Detailed geometric interpretation and mathematical derivation are provided to illustrate this trade-off mechanism. Experimental results for different speech enhancement tasks are consistent with the theoretical analysis, and it is verified that a reasonable trade-off between phase recovery and magnitude estimation can improve speech quality and intelligibility.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call