Abstract

Nowadays, voice interaction is increasingly applied to smart home appliances. There are various types of noises in our real lives, which requires speech enhancement technology to deal with multiple noisy speech scenarios and to process them in real-time. Traditional technologies of speech noise reduction require estimating the noise power spectrum first, then estimating the spectrogram gain value of noise reduction, such as minima controlled recursive averaging (MCRA), which can only deal with stationary environmental noises but cannot estimate noises with serious fluctuations of the power spectrum within quite limited durations. A highly complicated deep-learning model can estimate the power spectrum of various types of noise, but it cannot meet the requirement of real-time processing due to the large number of parameters of these general models. In this paper, we proposed a method combining deep-learning technologies with traditional signal processing techniques to estimate the power spectrum of various types of noises by designing a new model with fewer parameters, tiny deep convolutional recurrent network (TDCRN), and computing the speech gain value with the power spectrum. The result of our experiment indicates that, compared with the traditional technology and complicated deep-learning model, the proposed method, with only 0.29M parameters, increases the PESQ by more than 0.6, the STOI by more than 0.2 and the wake-up rate by more than 6%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call