Abstract
The supervised single-channel speech enhancement presents one mixture recording at the input of the neural network and updates network parameters in order to generate an output as the reconstructed speech signal. However, current neural networks-based single-channel speech enhancement methods are not able to fully utilise pertinence with the specific frequency range of speech signals with limited computational complexity. In this study, the authors studied the power spectral density of mixtures with human speech and noise interferences. Based on the theory that the speech signal distributes at the lower band, they proposed a method to train signal approximation (SA) based neural networks with the lower frequency band of the speech mixture to improve the performance. To realise the lower band approach for single-channel speech enhancement, the method uses a long short-term memory (LSTM) block to exploit short-time Fourier transform of the desired frequency range. Furthermore, in order to improve the speech enhancement performance within reverberant room environments, the dereverberation mask and the enhanced ratio mask are exploited as the training targets of two LSTM blocks, respectively. The detailed evaluations confirm that the proposed method outperforms the state-of-the-art methods.
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have