Single‐channel dereverberation and denoising based on lower band trained SA‐LSTMs

Yi Li,Syed Mohsen Naqvi,Yang Sun

doi:10.1049/iet-spr.2020.0134

Yi Li, Syed Mohsen Naqvi + Show 1 more

Open Access

PDF Available

https://doi.org/10.1049/iet-spr.2020.0134

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

The supervised single-channel speech enhancement presents one mixture recording at the input of the neural network and updates network parameters in order to generate an output as the reconstructed speech signal. However, current neural networks-based single-channel speech enhancement methods are not able to fully utilise pertinence with the specific frequency range of speech signals with limited computational complexity. In this study, the authors studied the power spectral density of mixtures with human speech and noise interferences. Based on the theory that the speech signal distributes at the lower band, they proposed a method to train signal approximation (SA) based neural networks with the lower frequency band of the speech mixture to improve the performance. To realise the lower band approach for single-channel speech enhancement, the method uses a long short-term memory (LSTM) block to exploit short-time Fourier transform of the desired frequency range. Furthermore, in order to improve the speech enhancement performance within reverberant room environments, the dereverberation mask and the enhanced ratio mask are exploited as the training targets of two LSTM blocks, respectively. The detailed evaluations confirm that the proposed method outperforms the state-of-the-art methods.

Full Text