Abstract
Most conventional speech enhancement methods work poorly at low SNRs. And the speech enhancement method based on generative adversarial network (SEGAN) gets lower speech quality though it has lots of parameters in its generator. To solve these problems, we propose a speech enhancement method based on a new architecture of Wasserstein generative adversarial network (SEWGAN), whose generator network and discriminator network are structured on the basis of fully convolutional neural networks (FCNNs) and deep neural networks (DNNs) respectively. In the paper, multiple noise and different signal-noise ratios (SNRs) are used to train the proposed method for improving its generalization capability. Experimental results show that the proposed method outperforms SEGAN and minimum mean square error estimators based on magnitude-squared spectrum (MMSE-MSS) in terms of both short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ). As expected, the work also demonstrates the proposed method has strong generalization capability in a real-world scenario.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have