In this paper, we propose a novel single-channel speech enhancement algorithm that applies dual-domain transforms comprising of dual-tree complex wavelet transform (DTCWT) and short-time Fourier transform (STFT) with a sparse non-negative matrix factorization (SNMF). The first domain belongs to the DTCWT, which is utilized on the time domain signals to conquer the weakness of signal distortions brought about by the downsampling of the discrete wavelet packet transform (DWPT) and delivered a set of subband signals. The second domain alludes to the STFT, which is exploited to each subband signal and built a complex spectrogram. At last, we apply the SNMF to the magnitude spectrogram for extracting speech components. In short, the DTCWT decomposes the time-domain noisy signal into a set of subband signals and afterward applied STFT to each subband signal, and we get nonnegative matrices by taking the absolute value of the complex matrix. From this point forward, we apply SNMF to each nonnegative matrix and identify the speech components. Finally, the estimated signal can be achieved through a subband binary ratio mask (SBRM) by applying the inverse STFT (ISTFT) and, subsequently, the inverse DTCWT (IDTCWT). The proposed approach is assessed utilizing the GRID audio-visual and IEEE databases, and diverse kinds of noises such as stationary, non-stationary, and quasi-stationary. The exploratory outcomes demonstrate that the proposed algorithm improved objective speech quality and intelligibility altogether at all considered signal to noise ratios (SNRs), compared to the other seven speech enhancement methods of STFT-SNMF, STFT-SNMFSE, MLD-STFT-SNMF, STFT-GDL, STFT-CJSR, DTCWT-SNMF, and DWPT-STFT-SNMF.
Read full abstract