Abstract

This paper presents a novel self-adaptive approach for speech enhancement in the context of highly nonstationary noise. A two-stage deep neuroevolutionary technique for speech enhancement is proposed. The first stage is composed of a deep neural network (DNN) method for speech enhancement. Two DNN methods were tested at this stage, namely, both a deep complex convolution recurrent network (DCCRN) and a residual long short-term memory neural network (ResLSTM). The ResLSTM method was combined with a minimum mean-square error method to perform a preliminary enhancement. The ResLSTM network is used as an <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">a priori</i> signal-to-noise ratio (SNR) estimator. The second stage implements a self-adaptive multiband spectral subtraction enhancement method using tuning optimization based on a genetic algorithm. The proposed two-stage technique is evaluated using objective measures of speech quality and intelligibility. The experiments are carried out using the NOIZEUS noisy speech corpus using conditions of real-world stationary, colored, and nonstationary noise sources at multiple SNR levels. These experiments demonstrate the advantage of building a cooperative approach using evolutionary and deep learning-based techniques that are capable of achieving robust speech enhancement in adverse conditions. Indeed, the experimental tests show that the proposed two-stage technique outperformed a baseline implementation using a state-of-the-art deep learning approach by an average 13% and 6% improvement for six noise conditions at a −5 dB and a 0 dB input SNR, respectively.

Highlights

  • Increased adaptivity is an important subject of speech enhancement research, focusing on dealing with nonstationary noises

  • It was deduced that this increase in performance was from the increased frequency of higher αi values

  • The results showed that a training-independent evolutionary method complemented a training-dependent deep learning method for speech enhancement when used as a postprocess

Read more

Summary

Introduction

Increased adaptivity is an important subject of speech enhancement research, focusing on dealing with nonstationary noises. The difficulty of a speech enhancement task can largely be related to the stationarity of the noise. Noise signals are nondeterministic and can be better categorized as stationary or nonstationary. A stationary noise signal is generated by a process that has statistical properties that do not change over time (e.g., a fan blowing in the background). Suppressing nonstationary noise (e.g., multitalker babbles) is a more difficult task than suppressing stationary noise. Very few methods of speech enhancement have been shown to be effective at reducing noise from highly nonstationary environments [1]. The main conventional categories of single-channel algorithms are spectral subtractive

Objectives
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.