Clean Speech Research Articles

The goal of speech enhancement is to restore clean speech in noisy environments. Acoustic scenarios with low signal-to-noise ratios (SNR) make it quite challenging to extract the target speech from its noise. In the current study, to enhance noisy speech, we propose a feature recalibration based multi-scale convolutional encoder-decoder architecture with squeeze temporal convolutional networks (S-TCN) bottleneck. Each multi-scale convolutional layer in encoder and decoder is followed by time-frequency attention module (TFA). The recalibration based multi-scale 2D convolution layers are used to extract local and contextual information. Additionally, the recalibration network is equipped with a gating mechanism to control the flow of information among the layers, enabling weighting of the scaled features for noise suppression and speech retention. The fully connected layer (FC) in the bottleneck part of encoder-decoder contains a few neurons, which capture the global information from the multi-scale 2D convolution layer and reduce parameters. A S-TCN, inspired by the popular temporal convolutional neural network (TCNN), is inserted between the encoder and the decoder to model long-term dependencies in speech. The TFA is a highly efficient network component, that operates through two simultaneous attentions, one focused on time frames, and the other on frequency channels. These attentions work together to explicitly exploit positional information to create a two-dimensional attention map to effectively capture the significant time-frequency distribution of speech. Utilizing the common voice dataset, our proposed model consistently enhances results compared to the current benchmarks, as demonstrated by two extensively utilized objective measures PESQ and STOI. The proposed model shows significant improvements, with average PESQ and STOI scores increasing by 45.7% and 23.8% respectively for seen background noises, and by 43.5% and 21.4% for unseen background noises, when compared to the quality of noisy speech. Tests validate that the proposed approach outperforms numerous cutting-edge algorithms.

We propose a multi-objective joint model of non-negative matrix factorization (NMF) and deep neural network (DNN) with a new loss function for speech enhancement. The proposed loss function (LMOFD) is a weighted combination of a frequency differential spectrum mean squared error (MSE)-based loss function (LFD) and a multi-objective MSE loss function LMO. The conventional MSE loss function computes the discrepancy between the estimated speech and clean speech across all frequencies, disregarding the process of changing amplitude in the frequency domain which contains valuable information. The differential spectrum representation retains spectral peaks that carry important information. Using this representation helps to ensure that this information in the speech signal is reserved. Also, on the other hand, noise spectra typically have a flat shape and as the differential operation makes the flat spectral partly close to zero, the differential spectrum is resistant to noises with smooth structures. Thus, we propose using a frequency-differentiated loss function that considers the magnitude spectrum differentiations between the neighboring frequency bins in each time frame. This approach maintains the spectrum variations of the objective signal in the frequency domain, which can effectively reduce the noise deterioration effects. The multi-objective MSE term LMO is a combined two-loss function related to the NMF coefficients which are the intermediate output targets, and the original spectral signals as the actual output targets. The use of encoded NMF coefficients as low-dimensional structural features for DNN serves as prior knowledge and helps the learning process. LMO is used beside LFD to take advantage of both the properties of the original and the differential spectrum in the training loss function. Moreover, a DNN-based noise classification and fusion strategy (NCF) is proposed to exploit a discriminative model for noise reduction. The experiments reveal the improvements of the proposed approach compared to the previous methods.

Clean Speech Research Articles

Related Topics

Articles published on Clean Speech

End‐to‐end speech‐denoising deep neural network based on residual‐attention gated linear units

A dual-region speech enhancement method based on voiceprint segmentation

Noise Reduction Using Sparsity Constrained and Regularized Iterative Thresholding Algorithm and Dictionary

TRNet: Two-level Refinement Network leveraging speech enhancement for noise robust speech emotion recognition

Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge

Speech Emotion Recognition under Noisy Environments with SNR Down to −6 dB Using Multi-Decoder Wave-U-Net

A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments

Multi scale encoder-decoder network with Time Frequency Attention and S-TCN for single channel speech enhancement

Visual Hallucination Elevates Speech Recognition

Deep Speech Denoising with Minimal Dependence on Clean Speech Data

Speaker identification under noisy conditions using hybrid convolutional neural network and gated recurrent unit

Performance analysis of a dilated attention fast GAN for speech enhancement

A noise-robust voice conversion method with controllable background sounds

Recalling-Enhanced Recurrent Neural Network optimized with Chimp Optimization Algorithm based speech enhancement for hearing aids

Zero-shot test time adaptation via knowledge distillation for personalized speech denoising and dereverberation.

ElectrodeNet—A Deep-Learning-Based Sound Coding Strategy for Cochlear Implants

Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss Function

Decoupling-style monaural speech enhancement with a triple-branch cross-domain fusion network

Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech.

Battling cultural bias within hate speech detection: An experimental correlation analysis

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Clean Speech Research Articles

Related Topics

Articles published on Clean Speech

End‐to‐end speech‐denoising deep neural network based on residual‐attention gated linear units

A dual-region speech enhancement method based on voiceprint segmentation

Noise Reduction Using Sparsity Constrained and Regularized Iterative Thresholding Algorithm and Dictionary

TRNet: Two-level Refinement Network leveraging speech enhancement for noise robust speech emotion recognition

Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge

Speech Emotion Recognition under Noisy Environments with SNR Down to −6 dB Using Multi-Decoder Wave-U-Net

A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments

Multi scale encoder-decoder network with Time Frequency Attention and S-TCN for single channel speech enhancement

Visual Hallucination Elevates Speech Recognition

Deep Speech Denoising with Minimal Dependence on Clean Speech Data

Speaker identification under noisy conditions using hybrid convolutional neural network and gated recurrent unit

Performance analysis of a dilated attention fast GAN for speech enhancement

A noise-robust voice conversion method with controllable background sounds

Recalling-Enhanced Recurrent Neural Network optimized with Chimp Optimization Algorithm based speech enhancement for hearing aids

Zero-shot test time adaptation via knowledge distillation for personalized speech denoising and dereverberation.

ElectrodeNet—A Deep-Learning-Based Sound Coding Strategy for Cochlear Implants

Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss Function

Decoupling-style monaural speech enhancement with a triple-branch cross-domain fusion network

Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech.

Battling cultural bias within hate speech detection: An experimental correlation analysis