Abstract

In this paper, we present a new two-stage speech enhancement approach, specially conceived to reduce musical and other random noises without requiring their localization in the time–frequency domain. The proposed method is motivated by two observations: (1) the random scattering nature of the energy peaks corresponding to the musical noise in the spectrogram of the processed speech; and (2) the existence of correlation between Wiener filter gains calculated at different frequencies. In the first stage of the proposed method, a preliminary gain function is generated using the nonnegative matrix factorization algorithm. In the second stage, a modified gain function that is more robust to noise artefacts, and referred to as calibrated filter, is estimated by applying a DNN-based nonlinear mapping function to the preliminary gain function. To further decrease the variability of the estimated calibrated filter, we propose to expand the DNN-based extraction of frequency dependencies to a set of preliminary gain functions derived from spectral estimates based on a family of data tapers; the resulting calibrated filter is referred to as multi-filter. The evaluation of the proposed DNN-based calibrated filter models for speech enhancement, under different noise types and input SNR levels, shows substantial improvements in terms of standard speech quality and intelligibility measures when compared to uncalibrated filter.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.