Abstract
In the existing speech enhancement methods based on deep neural network (DNN), the network architectures are not designed for speech enhancement specially, which extract local features of noisy speech in a non-causal way. In this paper, inspired by the feature calculation method based on the time–frequency correlation in the improved minima controlled recursive averaging (IMCRA), by using the long short-term memory (LSTM) and convolutional neural network (CNN) to model the correlation in the time and frequency dimensions respectively, a time–frequency smoothing neural network is proposed for speech enhancement. In order to verify the effectiveness of the proposed network in speech enhancement, various causal speech enhancement systems are established based on different networks, and extensive experiments are carried out in terms of speech quality and intelligibility. The experimental results show that the proposed network yields better speech enhancement performance compared with the other networks.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.