Abstract

In recent years, convolutional neural networks (CNNs) have been widely exploited in deep neural network (DNN)-based speech enhancement methods. However, the representation power of CNNs for speech modeling is limited because of the spatial-agnostic convolution kernels. This letter proposes a novel feature-specific convolution neural network (FSCNet) for real-time speech enhancement. In FSCNet, the encoder and decoder are adopted for forward and inverse feature space transformation, respectively. The denoising module based on the feature-specific convolution (FSC) is employed to enhance the generated deep features. Leveraging the long-term global contexts and considering the importance of each feature channel for speech modeling, the convolution kernels of FSC are dynamically parameterized in each time-frequency location. A function-constrained loss is further proposed to train the FSCNet, ensuring the encoder, denoising modules and decoder can function as expected. Experimental results show that the proposed FSCNet outperforms the state-of-the-art denoising algorithms in terms of five objective evaluation metrics and model size.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call