Effectively pretraining a speech translation decoder with Machine Translation data

Ashkan Alinejad,Anoop Sarkar

doi:10.18653/v1/2020.emnlp-main.644

Abstract

Directly translating from speech to text using an end-to-end approach is still challenging for many language pairs due to insufficient data. Although pretraining the encoder parameters using the Automatic Speech Recognition (ASR) task improves the results in low resource settings, attempting to use pretrained parameters from the Neural Machine Translation (NMT) task has been largely unsuccessful in previous works. In this paper, we will show that by using an adversarial regularizer, we can bring the encoder representations of the ASR and NMT tasks closer even though they are in different modalities, and how this helps us effectively use a pretrained NMT decoder for speech translation.

Highlights

Automatic Speech Translation (AST) aims to directly translate audio signals in the source language into the text words in the target language
We analyze the effect of our regularizer on two different settings: (A) When we only have access to AST data and (B) When we can benefit from External data
Adding external data can boost the performance of the cascaded model and by comparing Table 2 and 3, we can see that the additional Neural Machine Translation (NMT) and Automatic Speech Recognition (ASR) data can improve the translation quality of the cascaded model by +2 BLEU scores, while it can barely affect the AST model with pretrained encoder and the decoder

Summary

Introduction

Automatic Speech Translation (AST) aims to directly translate audio signals in the source language into the text words in the target language. While pretraining the encoder by an ASR model even in different languages shows promising results (Bansal et al, 2019), using a pretrained MT decoder is not beneficial (Berard et al, 2018; Bansal et al, 2018) or slightly improve the result (Sperber et al, 2019) and even in some cases may worsen the results (Bahar et al, 2019) One explanation for this phenomenon is that the decoder works well only if its input comes from an encoder that it was trained with (Lample et al, 2018). We show that this modification can improve the BLEU score by +2.0 BLEU points

End-to-End Speech Translation

Adversarial regularizer

Aligning encoder representations

Dataset

Preprocessing and Evaluation

Training settings

Model settings

Results

Using only AST data

Using both AST and External data

Related Work

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Effectively pretraining a speech translation decoder with Machine Translation data

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 30	License type: cc-by

Similar Papers

Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation

-

25 May 2021
25 May 2021

Regularizing cross-attention learning for end-to-end speech translation with ASR and MT attention matrices
Xiaohu Zhao ... Deyi Xiong
Expert Systems With Applications | VOL. 247
Xiaohu Zhao, et. al.Xiaohu Zhao ... Deyi Xiong
23 Jan 2024
Expert Systems With Applications | VOL. 247

Neural Machine Translation: A Review of the Approaches
Kamya Eria ... Manoj Jayabalan
Journal of Computational and Theoretical Nanoscience | VOL. 16
Kamya Eria, et. al.Kamya Eria ... Manoj Jayabalan
01 Aug 2019
Journal of Computational and Theoretical Nanoscience | VOL. 16

Multilingual Neural Translation

-

14 Feb 2020
14 Feb 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Effectively pretraining a speech translation decoder with Machine Translation data

Abstract

Highlights

Summary

Talk to us

Similar Papers