Abstract

The work proposed a denoising speech method using deep learning. The predictor and target network signals were the amplitude spectra of the wavelet-decomposition vectors of the noisy audio signal and clean audio signal, respectively. The output of the network was the amplitude spectrum of the denoised signal. Besides, the regression network used the input of the predictor to minimize the mean square error between its output and input targets. The denoised wavelet-decomposition vector was transformed back to the time domain by the output amplitude spectrum and the phase of the wavelet-decomposition vector. Then, the denoised speech was obtained by the inverse wavelet transform. This method overcame the problem that the frequency and time resolution of the short-time Fourier transform could not be adjusted. The noise reduction effect in each frequency band was improved due to the gradual reduction of the noise energy in the wavelet-decomposition process. The experimental results showed that the method has a good denoising effect in the whole frequency band.

Highlights

  • IntroductionSpeech signals are inevitably affected by the noises from the surrounding environment, transmission media, and electrical noise inside the communication equipment. ese interferences greatly degrade the performance of the speech processing system and affect the quality of speech

  • In the actual environment, speech signals are inevitably affected by the noises from the surrounding environment, transmission media, and electrical noise inside the communication equipment. ese interferences greatly degrade the performance of the speech processing system and affect the quality of speech

  • Several speech-denoising and speech-enhancement methods have been proposed based on the statistical difference between the speech and noise characteristics, including spectral subtraction [1], based estimation [2], Wiener filtering [3], subspace method [4], nonnegative matrix factorization (NMF) [5], and minimum mean square error (MMSE) [6]

Read more

Summary

Introduction

Speech signals are inevitably affected by the noises from the surrounding environment, transmission media, and electrical noise inside the communication equipment. ese interferences greatly degrade the performance of the speech processing system and affect the quality of speech. Several speech-denoising and speech-enhancement methods have been proposed based on the statistical difference between the speech and noise characteristics, including spectral subtraction [1], based estimation [2], Wiener filtering [3], subspace method [4], nonnegative matrix factorization (NMF) [5], and minimum mean square error (MMSE) [6]. Most of the filtering methods are limited to windowadding or masking operation in the frequency domain or time domain due to the strong time-frequency coupling between speech signals and noises. It is difficult for these filtering methods to achieve effective signal-noise separation. The constraints on computing power and the size of training data lead to the implementations of relatively small neural networks, limiting denoising performance

Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call