Abstract

Denoising a single-channel speech (recorded using one microphone) remains an open problem in many speech-related applications. Recently, supervised deep learning methods are used to denoise the speech signal. This work uses Deep Neural Network (DNN) to learn the Time–Frequency (T-F) mask of the clean speech from its noisy speech features. In general, Ideal Binary Mask (IBM) is used as the binary mask training target to improve speech intelligibility, and Ideal Ratio Mask (IRM) is used as a non-binary mask training target to improve speech quality. Still, it may not necessarily be the best T-F mask to analyze the performance of improvement in speech quality/intelligibility. However, an appropriate training target remains to be unclear for supervised deep learning methods. In this work, a non-binary novel soft T-F mask named Optimum Soft Mask (OSM) is proposed, analyzed and compared with different T-F mask types used for single-channel speech denoising methods. In addition, the target T-F mask is compared with the existing state of art approaches to show a clear performance advantage of supervised deep learning models. The performance of the binary and non-binary training targets of DNN is evaluated under different Signal-to-Noise-Ratio’s and noise conditions ti improve speech quality and intelligibility. The experimental results reveal that the binary mask IBM shows significant improvement in speech intelligibility; the non-binary mask IRM shows a substantial improvement in speech quality. At the same time, the proposed novel soft T-F mask shows notable improvement in both quality and intelligibility under various test conditions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.