Abstract
Collecting sufficient labeled data for spoken language understanding (SLU) is expensive and time-consuming. Recent studies achieved promising results by using pre-trained models in low-resource scenarios. Inspired by this, we aim to ask: which (if any) pre-training strategies can improve performance across SLU benchmarks? To answer this question, we employ four types of pre-trained models and their combinations for SLU. We leverage self-supervised speech and language models (LM) pre-trained on large quantities of un-paired data to extract strong speech and text representations. We also explore using supervised models pre-trained on larger external automatic speech recognition (ASR) or SLU corpora. We conduct extensive experiments on the SLU Evaluation (SLUE) benchmark and observe self-supervised pre-trained models to be more powerful, with pre-trained LM and speech models being most beneficial for the Sentiment Analysis and Named Entity Recognition task, respectively. <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> Our code and models will be publicly available as part of the ESPnet-SLU toolkit.
Full Text
Topics from this Paper
Spoken Language Understanding
Pre-Trained Language Models
Pre-trained Models
Named Entity Recognition Task
Automatic Speech Recognition
+ Show 5 more
Create a personalized feed of these topics
Get StartedTalk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
Jun 4, 2023
Dec 16, 2021
Jun 6, 2021
Jun 6, 2021
Jan 1, 2022
May 23, 2022
Jan 1, 2022
Jun 6, 2021
Apr 19, 2021
Applied Soft Computing
Dec 1, 2021
Apr 19, 2021
Jan 1, 2018
Oct 25, 2020
May 6, 2022