Integrating Multiple ASR Systems into NLP Backend with Attention Fusion

Takatomo Kano,Shinji Watanabe,Marc Delcroix,Atsunori Ogawa

doi:10.1109/icassp43922.2022.9746699

Abstract

Spoken language processing (SLP) systems such as speech summarization and translation can be achieved by cascade models. It combines an automatic speech recognition (ASR) frontend and a natural language processing (NLP) backend including machine translation (MT) or text summarization (TS). With this cascade approach, we can exploit large non-paired datasets to independently train state-of-the-art models for each module. However, ASR errors directly affect the performance of the NLP backend in the cascade approach. In this paper, we reduce the impact of ASR errors on the NLP back-end by combining transcriptions from various ASR systems. Recognizer output voting error reduction (ROVER) is a widely used technique for system combination. Although ROVER improves ASR performance, the combination process is not optimized for backend tasks. We propose a system combination that resembles ROVER using attention fusion to achieve the alignment and the combination of multiple ASR hypotheses. This allows the combination process to be optimized for the backend NLP task without changing the ASR frontend. Our proposed technique is general and can be applied to various SLP tasks. We confirm its effectiveness on both speech summarization and translation experiments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Integrating Multiple ASR Systems into NLP Backend with Attention Fusion

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Attention-Based Multi-Hypothesis Fusion for Speech Summarization
Takatomo Kano ... Shinji Watanabe
-
Takatomo Kano, et. al.Takatomo Kano ... Shinji Watanabe
13 Dec 2021
13 Dec 2021

A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)
J.G Fiscus
-
J.G FiscusJ.G Fiscus
14 Dec 1997
14 Dec 1997

Combined speech enhancement and auditory modelling for robust distributed speech recognition
Ronan Flynn ... Edward Jones
Speech Communication | VOL. 50
Ronan Flynn, et. al.Ronan Flynn ... Edward Jones
20 May 2008
Speech Communication | VOL. 50

A SYSTEM COMBINATION FOR MALAY BROADCAST NEWS TRANSCRIPTION
Zainab A Khalaf ... Basem H A Ahmed
Jurnal Teknologi | VOL. 77
Zainab A Khalaf, et. al.Zainab A Khalaf ... Basem H A Ahmed
30 Nov 2015
Jurnal Teknologi | VOL. 77

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integrating Multiple ASR Systems into NLP Backend with Attention Fusion

Abstract

Talk to us

Similar Papers