Abstract
In view of the residual problem of speech background noise in supervised model based single-channel speech separation algorithm in non-stationary noise environments, a piecewise time-frequency masking target based on Wiener filtering principle is proposed and used as the training target of neural network, which can not only track the SNR changes, but also reduce the damage to speech quality. By combing the four features of Relative spectral transform and perceptual linear prediction (RASTA-PLP) + amplitude modulation spectrogram (AMS) + Mel-frequency cepstral coefficients (MFCC) + Gammatone frequency cepstral coefficient (GFCC), the extracted multi-level voice information is used as the training features of the neural network, and then a deep neural network (DNN) based speech separation system is constructed to separate the noisy speech. The experimental results show that: compared with traditional time-frequency masking methods, the segmented time-frequency masking algorithm can improve the speech quality and clarity, and achieves the purpose of noise suppression and better speech separation performance at low SNR.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have