Audio-Aware Spoken Multiple-Choice Question Answering With Pre-Trained Language Models

Chia-Chih Kuo,Shang-Bao Luo,Kuan-Yu Chen

doi:10.1109/taslp.2021.3120638

Abstract

Spoken multiple-choice question answering (SMCQA) requires machines to select the correct choice to answer the question by referring to the passage, where the passage, the question, and multiple choices are all in the form of speech. While the audio could contain useful cues for SMCQA, usually only the auto-transcribed text is utilized in model development. Thanks to the large-scaled pre-trained language representation models, such as the bidirectional encoder representations from Transformers (BERT), systems with only auto-transcribed text can still achieve a certain level of performance. However, previous studies have evidenced that acoustic-level statistics can offset text inaccuracies caused by the automatic speech recognition systems or representation inadequacy lurking in word embedding generators, thereby making the SMCQA system robust. Along the line of research, in this study, an audio-aware SMCQA framework is proposed. Two different mechanisms are introduced to distill the useful cues from speech, and then a BERT-based SMCQA framework is presented. In other words, the proposed SMCQA framework not only inherits the advantages of contextualized language representations learned by BERT but integrates the complementary acoustic-level information distilled from audio with the text-level information. A series of experiments demonstrates remarkable improvements in accuracy over selected baselines and SOTA systems on a published Chinese SMCQA dataset.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2021
Citations: 1	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

Audio-Aware Spoken Multiple-Choice Question Answering With Pre-Trained Language Models

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Similar Papers

An Audio-Enriched BERT-Based Framework for Spoken Multiple-Choice Question Answering
Chia-Chih Kuo ... Kuan-Yu Chen
-
Chia-Chih Kuo, et. al.Chia-Chih Kuo ... Kuan-Yu Chen
25 Oct 2020
25 Oct 2020

Tibetan Sentence Boundaries Automatic Disambiguation Based on Bidirectional Encoder Representations from Transformers on Byte Pair Encoding Word Cutting Method
Fenfang Li ... Zhengzhang Zhao
Applied Sciences | VOL. 14
Fenfang Li, et. al.Fenfang Li ... Zhengzhang Zhao
02 Apr 2024
Applied Sciences | VOL. 14

Classification of Fire Related Tweets on Twitter Using Bidirectional Encoder Representations from Transformers (BERT)
Jairus Mingua ... Dionis Padilla
-
Jairus Mingua, et. al.Jairus Mingua ... Dionis Padilla
28 Nov 2021
28 Nov 2021

Augmenting commit classification by using fine-grained source code changes and a pre-trained deep neural language model
Lobna Ghadhab ... Mohamed Wiem Mkaouer
Information and Software Technology | VOL. 135
Lobna Ghadhab, et. al.Lobna Ghadhab ... Mohamed Wiem Mkaouer
10 Mar 2021
Information and Software Technology | VOL. 135

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Audio-Aware Spoken Multiple-Choice Question Answering With Pre-Trained Language Models

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing