Speech Enhancement Task Research Articles

Recent speech enhancement research has shown that deep learning techniques are very effective in removing background noise. Many deep neural networks are being proposed, showing promising results for improving overall speech perception. The Deep Multilayer Perceptron, Convolutional Neural Networks, and the Denoising Autoencoder are well-established architectures for speech enhancement; however, choosing between different deep learning models has been mainly empirical. Consequently, a comparative analysis is needed between these three architecture types in order to show the factors affecting their performance. In this paper, this analysis is presented by comparing seven deep learning models that belong to these three categories. The comparison includes evaluating the performance in terms of the overall quality of the output speech using five objective evaluation metrics and a subjective evaluation with 23 listeners; the ability to deal with challenging noise conditions; generalization ability; complexity; and, processing time. Further analysis is then provided while using two different approaches. The first approach investigates how the performance is affected by changing network hyperparameters and the structure of the data, including the Lombard effect. While the second approach interprets the results by visualizing the spectrogram of the output layer of all the investigated models, and the spectrograms of the hidden layers of the convolutional neural network architecture. Finally, a general evaluation is performed for supervised deep learning-based speech enhancement while using SWOC analysis, to discuss the technique’s Strengths, Weaknesses, Opportunities, and Challenges. The results of this paper contribute to the understanding of how different deep neural networks perform the speech enhancement task, highlight the strengths and weaknesses of each architecture, and provide recommendations for achieving better performance. This work facilitates the development of better deep neural networks for speech enhancement in the future.

Recently, deep neural networks (DNNs) have become the mainstream strategy for speech enhancement task because it can achieve the higher speech quality and intelligibility than the traditional methods. However, these DNN-based methods always need a large number of parallel corpus consisting of clean speech and noise to produce noisy data for the training of the DNN in order to improve the generalization of the network. As a result, this implies that many noisy speech signals that are collected in real environment cannot be used to train the DNN because of the lack of corresponding clean speech and noise. Additionally, as we know, noise varies with the time and scenario, so we cannot obtain parallel speech and noise due to infinite noise data and some limited speech data. Thus, the network training with unparallel speech and noise data is essential for the generalization of the network. To address this problem, we propose a novel parallel-data-free speech enhancement method, in which the cycle-consistent generative adversarial network (CycleGAN) and multi-objective learning are employed. Our method is also able to make best use of the benefits of multi-objective learning. On the training stage, we utilize two different encoders to encode the features of clean speech and noisy speech, respectively. Then, two forward generators are immediately used to predict the ideal time-frequency (T-F) mask and log-power spectrum (LPS) of clean speech. Two inverse generators are applied to map the magnitude spectrum (MS) and LPS of noisy speech, respectively. In addition, four discriminators are used to distinguish the real speech features from the generated features. Two encoders, four generators and four discriminators are simultaneously trained by using adversarial, identity-mapping, latent similarity and cycle-consistent loss. On the test stage, we directly utilize the forward generators and encoders to acquire the enhanced speech. The experimental results indicate that the proposed approach is able to achieve the better speech enhancement performance than the reference methods. Moreover, the proposed method is also effective to improve speech quality and intelligibility when the networks are trained under the parallel data.

Speech Enhancement Task Research Articles

Related Topics

Articles published on Speech Enhancement Task

Improved Lite Audio-Visual Speech Enhancement

Monaural Speech Enhancement Using a Multi-Branch Temporal Convolutional Network

Improving Frame-Online Neural Speech Enhancement With Overlapped-Frame Prediction

Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions

Phase sensitive masking-based single channel speech enhancement using conditional generative adversarial network

Message Passing-Based Inference for Time-Varying Autoregressive Models.

Deep ad-hoc beamforming

On the Design of Differential Kronecker Product Beamformers

Meta-SE: A Meta-Learning Framework for Few-Shot Speech Enhancement

Formula omitted]-law SGAN for generating spectra with more details in speech enhancement

An Experimental Analysis of Deep Learning Architectures for Supervised Speech Enhancement

Reconstructing fundamental frequency from noisy speech using initialized autoencoders

Improving speech enhancement by focusing on smaller values using relative loss

A multi-objective learning speech enhancement algorithm based on IRM post-processing with joint estimation of SCNN and TCNN

Embedding Encoder-Decoder With Attention Mechanism for Monaural Speech Enhancement

Combining Multi-Perspective Attention Mechanism With Convolutional Networks for Monaural Speech Enhancement

Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks

A Parallel-Data-Free Speech Enhancement Method Using Multi-Objective Learning Cycle-Consistent Generative Adversarial Network

Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function

Enhancing Target Speech Based on Nonlinear Soft Masking Using a Single Acoustic Vector Sensor

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Speech Enhancement Task Research Articles

Related Topics

Articles published on Speech Enhancement Task

Improved Lite Audio-Visual Speech Enhancement

Monaural Speech Enhancement Using a Multi-Branch Temporal Convolutional Network

Improving Frame-Online Neural Speech Enhancement With Overlapped-Frame Prediction

Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions

Phase sensitive masking-based single channel speech enhancement using conditional generative adversarial network

Message Passing-Based Inference for Time-Varying Autoregressive Models.

Deep ad-hoc beamforming

On the Design of Differential Kronecker Product Beamformers

Meta-SE: A Meta-Learning Framework for Few-Shot Speech Enhancement

Formula omitted]-law SGAN for generating spectra with more details in speech enhancement

An Experimental Analysis of Deep Learning Architectures for Supervised Speech Enhancement

Reconstructing fundamental frequency from noisy speech using initialized autoencoders

Improving speech enhancement by focusing on smaller values using relative loss

A multi-objective learning speech enhancement algorithm based on IRM post-processing with joint estimation of SCNN and TCNN

Embedding Encoder-Decoder With Attention Mechanism for Monaural Speech Enhancement

Combining Multi-Perspective Attention Mechanism With Convolutional Networks for Monaural Speech Enhancement

Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks

A Parallel-Data-Free Speech Enhancement Method Using Multi-Objective Learning Cycle-Consistent Generative Adversarial Network

Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function

Enhancing Target Speech Based on Nonlinear Soft Masking Using a Single Acoustic Vector Sensor