Low Signal-to-noise Ratio Levels Research Articles

We propose a multi-target, signal-to-noise-ratio (SNR)-progressive learning (SNR-PL) framework for regression based speech enhancement (SE). At low SNR levels, it is often not easy to directly learn the complicated regression required in SE. We therefore decompose the original SE problem of mapping noisy to clean speech features, with a large SNR gap, into a series of sub-problems, each with a small SNR increment and presumably easier to learn. In our configurations, each hidden layer of the proposed regression neural network is guided to explicitly learn an intermediate target with a specified but small SNR gain. Tested on both deep neural network (DNN) and long short-term memory (LSTM) architectures, SNR-PL consistently outperforms the conventional “black box” DNN framework in terms of both objective measure superiority and network model compactness. Furthermore, with the best configured LSTM-based SNR-PL model, we often observe that the performance is easily saturated or even degraded when increasing the number of intermediate targets, due to the fact that useful information is lost in dimension reduction when involving more target layers. Accordingly, to address this information loss issue, we explore densely connected networks on top of the LSTM structure where the input and the preceding intermediate targets are concatenated together to learn the next target. Finally, to fully utilize the rich and complementary information of intermediate targets, a simple post-processing strategy is adopted to further improve the performance. Evaluated on the simulation speech data, experimental results in unseen noises cases demonstrate that the proposed approach consistently performs better than the conventional LSTM approach in terms of objective speech enhancement measures for speech intelligibility and quality. Furthermore, when evaluated on real data provided by the CHiME-4 Challenge for automatic speech recognition (ASR) of noisy microphone array speech, we show that the proposed approach with intermediate outputs can directly improve the ASR performance, while the conventional LSTM approach increases the word error rate.

Read full abstract

Human speech processing is inherently multi-modal, where visual cues (e.g. lip movements) can help better understand speech in noise. Our recent work [1] has shown that lip-reading driven, audio-visual (AV) speech enhancement can significantly outperform benchmark audio-only approaches at low signal-to-noise ratios (SNRs). However, consistent with our cognitive hypothesis, visual cues were found to be relatively less effective for speech enhancement at high SNRs, or low levels of background noise, where audio-only (A-only) cues worked adequately. Therefore, a more cognitively-inspired, context-aware AV approach is required, that contextually utilises both visual and noisy audio features, and thus more effectively accounts for different noisy conditions. In this paper, we introduce a novel context-aware AV speech enhancement framework that contextually exploits AV cues with respect to different operating conditions, in order to estimate clean audio, without requiring any prior SNR estimation. In particular, an AV switching module is developed by integrating a convolutional neural network (CNN) and long-short-term memory (LSTM) network, that learns to contextually switch between visualonly (V-only), A-only and both AV cues at low, high and moderate SNR levels, respectively. For testing, the estimated clean audio features are utilised using an innovative, enhanced visually-derived Wiener filter (EVWF) for noisy speech filtering. The context-aware AV speech enhancement framework is evaluated in dynamic real-world scenarios (including cafe, street, bus, and pedestrians) at different SNR levels (ranging from low to high SNRs), using benchmark Grid and ChiME3 corpora. For objective testing, perceptual evaluation of speech quality (PESQ) is used to evaluate the quality of the restored speech. For subjective testing, the standard mean-opinion-score (MOS) method is used. Comparative experimental results show the superior performance of our proposed context-aware AV approach, over A-only, V-only, spectral subtraction (SS), and log-minimum mean square error (LMMSE) based speech enhancement methods, at both low and high SNRs. The preliminary findings demonstrate the capability of our novel approach to deal with spectro-temporal variations in real-world noisy environments, by contextually exploiting the complementary strengths of audio and visual cues. In conclusion, our contextual deep learning-driven AV framework is posited as a benchmark resource for the multi-modal speech processing and machine learning communities.

Read full abstract

Low Signal-to-noise Ratio Levels Research Articles

Related Topics

Articles published on Low Signal-to-noise Ratio Levels

IncepSeqNet: Advancing Signal Classification with Multi-Shape Augmentation (Student Abstract)

P2T2: A physically-primed deep-neural-network approach for robust T2 distribution estimation from quantitative T2-weighted MRI.

Precursors for synthetic aperture radar

Blind Space Time Block Coding Categorization over AF Relaying Broadcasts

Highly Sensitive Readout Interface for Real-Time Differential Precision Measurements with Impedance Biosensors.

Second-Order Statistics for STBC Classification Over Amplify-and-Forward Cooperative Systems

Suppressing the Saturated Negative Effects to Recover the Effective Sensing FID Signal From an Overhauser Magnetometer via Segmented Linear Regression

A decrease in physiological arousal accompanied by stable behavioral performance reflects task habituation.

Overlap Sliding Window Algorithm for Better BER in Turbo Decoding

Filtered Multicarrier Waveforms Classification: A Deep Learning-Based Approach

Single Acoustic Sensor-Based Time–Frequency Spectrum Sensing Approach for Land Vehicle Detection

Robust and Fast Temperature Extraction for Brillouin Optical Time-Domain Analyzer by Using Denoising Autoencoder-Based Deep Neural Networks

A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement

Variational Mode Decomposition-Based Threat Classification for Fiber Optic Distributed Acoustic Sensing

1D Convolutional Neural Networks Versus Automatic Classifiers for Known LPI Radar Signals Under White Gaussian Noise

Performance Comparison of Closed-Form Least Squares Algorithms for Hyperbolic 3-D Positioning

Contextual deep learning-based audio-visual switching for speech enhancement in real-world environments

Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network

Maximum Likelihood Joint Angle and Delay Estimation from Multipath and Multicarrier Transmissions with Application to Indoor Localization over IEEE 802.11ac Radio

Error compensation in indoor positioning systems based on software defined visible light communication

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Low Signal-to-noise Ratio Levels Research Articles

Related Topics

Articles published on Low Signal-to-noise Ratio Levels

IncepSeqNet: Advancing Signal Classification with Multi-Shape Augmentation (Student Abstract)

P2T2: A physically-primed deep-neural-network approach for robust T2 distribution estimation from quantitative T2-weighted MRI.

Precursors for synthetic aperture radar

Blind Space Time Block Coding Categorization over AF Relaying Broadcasts

Highly Sensitive Readout Interface for Real-Time Differential Precision Measurements with Impedance Biosensors.

Second-Order Statistics for STBC Classification Over Amplify-and-Forward Cooperative Systems

Suppressing the Saturated Negative Effects to Recover the Effective Sensing FID Signal From an Overhauser Magnetometer via Segmented Linear Regression

A decrease in physiological arousal accompanied by stable behavioral performance reflects task habituation.

Overlap Sliding Window Algorithm for Better BER in Turbo Decoding

Filtered Multicarrier Waveforms Classification: A Deep Learning-Based Approach

Single Acoustic Sensor-Based Time–Frequency Spectrum Sensing Approach for Land Vehicle Detection

Robust and Fast Temperature Extraction for Brillouin Optical Time-Domain Analyzer by Using Denoising Autoencoder-Based Deep Neural Networks

A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement

Variational Mode Decomposition-Based Threat Classification for Fiber Optic Distributed Acoustic Sensing

1D Convolutional Neural Networks Versus Automatic Classifiers for Known LPI Radar Signals Under White Gaussian Noise

Performance Comparison of Closed-Form Least Squares Algorithms for Hyperbolic 3-D Positioning

Contextual deep learning-based audio-visual switching for speech enhancement in real-world environments

Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network

Maximum Likelihood Joint Angle and Delay Estimation from Multipath and Multicarrier Transmissions with Application to Indoor Localization over IEEE 802.11ac Radio

Error compensation in indoor positioning systems based on software defined visible light communication