Bridging Speech and Textual Pre-Trained Models With Unsupervised ASR

Jiatong Shi,Paola Garcia,Chan-Jan Hsu,Dongji Gao,Holam Chung,Ann Lee,Hung-Yi Lee,Shinji Watanabe

doi:10.1109/icassp49357.2023.10096827

Abstract

Spoken language understanding (SLU) is a task aiming to extract high-level semantics from spoken utterances. Previous works have investigated the use of speech self-supervised models and textual pre-trained models, which have shown reasonable improvements to various SLU tasks. However, because of the mismatched modalities between speech signals and text tokens, previous methods usually need complex designs of the frameworks. This work proposes a simple yet efficient unsupervised paradigm that connects speech and textual pre-trained models, resulting in an unsupervised speech-to-semantic pre-trained model for various tasks in SLU. To be specific, we propose to use unsupervised automatic speech recognition (ASR) as a connector that bridges different modalities used in speech and textual pre-trained models. Our experiments show that unsupervised ASR itself can improve the representations from speech self-supervised models. More importantly, it is shown as an efficient connector between speech and textual pre-trained models, improving the performances of five different SLU tasks. Notably, on spoken question answering, we reach the state-of-the-art result over the challenging NMSQA benchmark.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bridging Speech and Textual Pre-Trained Models With Unsupervised ASR

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Benefits of pre-trained mono- and cross-lingual speech representations for spoken language understanding of Dutch dysarthric speech
Pu Wang ... Hugo Van Hamme
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2023
Pu Wang, et. al.Pu Wang ... Hugo Van Hamme
07 Apr 2023
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2023

Joint Spoken Language Understanding and Domain Adaptive Language Modeling
Huifeng Zhang ... Shuai Fan
-
Huifeng Zhang, et. al.Huifeng Zhang ... Shuai Fan
01 Jan 2018
01 Jan 2018

Where Are We in Semantic Concept Extraction for Spoken Language Understanding?
Sahar Ghannay ... Gaëlle Laperrière
-
Sahar Ghannay, et. al.Sahar Ghannay ... Gaëlle Laperrière
01 Jan 2020
01 Jan 2020

Meta Auxiliary Learning for Low-resource Spoken Language Understanding
Yingying Gao ... Shilei Zhang
-
Yingying Gao, et. al.Yingying Gao ... Shilei Zhang
18 Sep 2022
18 Sep 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bridging Speech and Textual Pre-Trained Models With Unsupervised ASR

Abstract

Talk to us

Similar Papers