Time-Domain Joint Training Strategies of Speech Enhancement and Intent Classification Neural Models.

Mohamed Nabih Ali,Alessio Brutti,Daniele Falavigna

doi:10.3390/s22010374

Mohamed Nabih Ali, Alessio Brutti + Show 1 more

Open Access

https://doi.org/10.3390/s22010374

Copy DOI

Journal: Sensors	Publication Date: Jan 4, 2022
Citations: 4	License type: CC BY 4.0

Affiliation: Fondazione Bruno Kessler, University of Trento

Abstract

Robustness against background noise and reverberation is essential for many real-world speech-based applications. One way to achieve this robustness is to employ a speech enhancement front-end that, independently of the back-end, removes the environmental perturbations from the target speech signal. However, although the enhancement front-end typically increases the speech quality from an intelligibility perspective, it tends to introduce distortions which deteriorate the performance of subsequent processing modules. In this paper, we investigate strategies for jointly training neural models for both speech enhancement and the back-end, which optimize a combined loss function. In this way, the enhancement front-end is guided by the back-end to provide more effective enhancement. Differently from typical state-of-the-art approaches employing on spectral features or neural embeddings, we operate in the time domain, processing raw waveforms in both components. As application scenario we consider intent classification in noisy environments. In particular, the front-end speech enhancement module is based on Wave-U-Net while the intent classifier is implemented as a temporal convolutional network. Exhaustive experiments are reported on versions of the Fluent Speech Commands corpus contaminated with noises from the Microsoft Scalable Noisy Speech Dataset, shedding light and providing insight about the most promising training approaches.

Highlights

The use of audio-visual platforms e.g., Microsoft Teams, Google Meet, Zoom, etc., for smart-working, remote collaborations and many other applications has been growing exponentially
The strategy proposed in this paper aims to jointly adjusting the parameters of a neural speech enhancement model and a neural model designed for a specific task (e.g., Automatic Speech Recognition (ASR), voice activity detection, or intent classification)
In this paper we proposed an end-to-end joint training approaches to robust intent classification in noisy environment

Summary

Introduction

The use of audio-visual platforms e.g., Microsoft Teams, Google Meet, Zoom, etc., for smart-working, remote collaborations and many other applications has been growing exponentially. In these cases, the speech signal is the predominant tool used for communication, and sharing ideas between people [1]. Many speech applications, like Automatic Speech Recognition (ASR), suffer in the presence of these adverse noisy conditions which deteriorate the speech quality and intelligibility, leading to considerable performance drops [6,7], especially in low level of signal-to-noise ratio (SNR). A possible approach is to train, or adapt the models on the noisy data [8].

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Time-Domain Joint Training Strategies of Speech Enhancement and Intent Classification Neural Models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application
Yu-Wen Chen ... Kai-Chun Liu
IEEE Access | VOL. 10
Yu-Wen Chen, et. al.Yu-Wen Chen ... Kai-Chun Liu
01 Jan 2021
IEEE Access | VOL. 10

SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network.
Rongchuang Lv ... Dingyu Yang
Mathematical biosciences and engineering : MBE | VOL. 21
Rongchuang Lv, et. al.Rongchuang Lv ... Dingyu Yang
01 Jan 2024
Mathematical biosciences and engineering : MBE | VOL. 21

Kalman Filtering with Machine Learning Methods for Speech Enhancement

-

04 May 2021
04 May 2021

Noise-management algorithm may improve speech intelligibility in noise
Francis K Kuk ... Carsten Paludan-Müller
The Hearing Journal | VOL. 59
Francis K Kuk, et. al.Francis K Kuk ... Carsten Paludan-Müller
01 Apr 2006
The Hearing Journal | VOL. 59

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Time-Domain Joint Training Strategies of Speech Enhancement and Intent Classification Neural Models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors