Abstract
This paper presents a novel self-adaptive approach for speech enhancement in the context of highly nonstationary noise. A two-stage deep neuroevolutionary technique for speech enhancement is proposed. The first stage is composed of a deep neural network (DNN) method for speech enhancement. Two DNN methods were tested at this stage, namely, both a deep complex convolution recurrent network (DCCRN) and a residual long short-term memory neural network (ResLSTM). The ResLSTM method was combined with a minimum mean-square error method to perform a preliminary enhancement. The ResLSTM network is used as an <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">a priori</i> signal-to-noise ratio (SNR) estimator. The second stage implements a self-adaptive multiband spectral subtraction enhancement method using tuning optimization based on a genetic algorithm. The proposed two-stage technique is evaluated using objective measures of speech quality and intelligibility. The experiments are carried out using the NOIZEUS noisy speech corpus using conditions of real-world stationary, colored, and nonstationary noise sources at multiple SNR levels. These experiments demonstrate the advantage of building a cooperative approach using evolutionary and deep learning-based techniques that are capable of achieving robust speech enhancement in adverse conditions. Indeed, the experimental tests show that the proposed two-stage technique outperformed a baseline implementation using a state-of-the-art deep learning approach by an average 13% and 6% improvement for six noise conditions at a −5 dB and a 0 dB input SNR, respectively.
Highlights
Increased adaptivity is an important subject of speech enhancement research, focusing on dealing with nonstationary noises
It was deduced that this increase in performance was from the increased frequency of higher αi values
The results showed that a training-independent evolutionary method complemented a training-dependent deep learning method for speech enhancement when used as a postprocess
Summary
Increased adaptivity is an important subject of speech enhancement research, focusing on dealing with nonstationary noises. The difficulty of a speech enhancement task can largely be related to the stationarity of the noise. Noise signals are nondeterministic and can be better categorized as stationary or nonstationary. A stationary noise signal is generated by a process that has statistical properties that do not change over time (e.g., a fan blowing in the background). Suppressing nonstationary noise (e.g., multitalker babbles) is a more difficult task than suppressing stationary noise. Very few methods of speech enhancement have been shown to be effective at reducing noise from highly nonstationary environments [1]. The main conventional categories of single-channel algorithms are spectral subtractive
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.