Selective Film Conditioning with CTC-Based ASR Probability for Speech Enhancement

Da-Hee Yang,Joon-Hyuk Chang

doi:10.1109/icassp49357.2023.10096375

Da-Hee Yang, Joon-Hyuk Chang

https://doi.org/10.1109/icassp49357.2023.10096375

Copy DOI

Export

Save

Cite

Publication Date: Jun 4, 2023

Affiliation: Hanyang University

Abstract
Full-Text
Similar Papers

Abstract

Listen

Enhancing speech quality and intelligibility for automatic speech recognition (ASR) plays an important role in modeling speech enhancement (SE) systems. However, improving the ASR performance by utilizing SE networks is not guaranteed, owing to the discrepancy in the training methods of the two systems. Therefore, recent studies have gradually incorporated ASR information into SE systems by jointly training ASR and SE systems. Although prior studies have improved the performance, they are inefficient because the two networks are combined and require large model sizes. To address this limitation, we propose an efficient way to use feature-wise linear modulation (FiLM) conditioning with CTC-based ASR probabilities for the SE system. The proposed model is designed by stacking a FiLM layer with selective learning on each temporal convolutional network of the SE estimation module. This allows the SE network to adaptively select ASR information based on the relationship between context and acoustic information. The proposed method improves SE and ASR performance, resulting in more robust results against noise with only a small increase in the number of parameters.

Full Text

Published Version

Check institute access

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Selective Film Conditioning with CTC-Based ASR Probability for Speech Enhancement

Abstract

Published Version

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Combined speech enhancement and auditory modelling for robust distributed speech recognition
Ronan Flynn ... Edward Jones
Speech Communication | VOL. 50
Ronan Flynn, et. al.Ronan Flynn ... Edward Jones
20 May 2008
Speech Communication | VOL. 50

Dysarthric Speech Enhancement Based on Convolution Neural Network.
Syu-Siang Wang ... Shih-Hau Fang
Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference | VOL. 2022
Syu-Siang Wang, et. al.Syu-Siang Wang ... Shih-Hau Fang
11 Jul 2022
11 Jul 2022

A generic neural acoustic beamforming architecture for robust multi-channel speech processing
Jahn Heymann ... Reinhold Haeb-Umbach
Computer Speech & Language | VOL. 46
Jahn Heymann, et. al.Jahn Heymann ... Reinhold Haeb-Umbach
28 Feb 2017
Computer Speech & Language | VOL. 46

Autocorrelation-based Methods for Noise-Robust Speech Recognition
Gholamreza Farahani ... Mohammad Mehdi
-
Gholamreza Farahani, et. al.Gholamreza Farahani ... Mohammad Mehdi
01 Jun 2007
01 Jun 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Selective Film Conditioning with CTC-Based ASR Probability for Speech Enhancement

Abstract

Published Version

Talk to us

Similar Papers