Abstract
Speech distortion and noise residual in the enhanced speech are a pair of contradictory factors affecting the performance of speech enhancement based on deep neural networks. Because of the variety of noise environments and the complexity of human auditory perception, the single output characteristic of a speech enhancement network cannot always achieve good trade-offs between speech preservation and noise suppression for different listeners in different noise environments. To address this problem, this paper proposes dynamic controllable speech enhancement models based on the quantile loss functions in the time-frequency domain and the time domain, which are composed of multiple speech enhancement networks with different trade-offs between speech preservation and noise suppression. To train speech enhancement networks with different output characteristics, this paper designs the quantile loss functions for speech enhancement in the time-frequency domain and the time domain to balance the overestimation and underestimation of speech, thus indirectly controls the speech preservation and noise suppression level of speech enhancement networks. Experimental results show that the quantile loss functions proposed in this paper can effectively control the speech preservation and noise suppression performance of speech enhancement networks, and the dynamic controllable models established have multiple output characteristics. When using these models for speech enhancement, listeners can control the output characteristics based on their subjective auditory perception, thereby obtain better speech enhancement performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.