Noise Classification Speech Enhancement Generative Adversarial Network

Tao Feng,Peng Zhang,Ye Li,Fuqiang Wang,Shu Li

doi:10.1109/itoec53115.2022.9734565

Abstract

The purpose of speech enhancement is to extract the speech signal from various noise backgrounds, improving the quality of the speech signal. After the emergence of the Speech Enhancement Generative Adversarial (SEGAN), it has achieved good results in the field of speech enhancement. However, SEGAN does not have an excellent speech enhancement effect in the case of low signal-to-noise ratio, it has weak generalization ability in the face of unknown noise. In this paper, we propose a method of generative adversarial network speech enhancement using noise background classification. In this method, the inputs are noisy speeches, which have a variety of background noises. Mel Frequency Cepstral Coefficient (MFCC) features of noisy speeches are extracted, convolutional neural network is used to classify each noisy background, and the classified noisy speeches are labeled with the type of background noise. The labeled noisy speeches are sent to the speech enhancement model. There are several SEGANs in the speech enhancement model. Each SEGAN enhances noisy speeches with a particular of background noise. Under extremely low signal-to-noise ratio conditions and in the face of unknown noise, we evaluate this method in extensive experiments, using objective evaluation indicators to evaluate the effectiveness of the model. Compared with the SEGAN model under the condition of extremely low signal-to-noise ratio, the model in this paper can eliminate noise better, and each objective index has been improved. In the face of unknown background noise, objective evaluation index of NCSEGAN is better than SEGAN, which confirms the effectiveness of the method.

Full Text