Edinburgh’s End-to-End Multilingual Speech Translation System for IWSLT 2021

Biao Zhang,Rico Sennrich

doi:10.18653/v1/2021.iwslt-1.19

Abstract

This paper describes Edinburgh’s submissions to the IWSLT2021 multilingual speech translation (ST) task. We aim at improving multilingual translation and zero-shot performance in the constrained setting (without using any extra training data) through methods that encourage transfer learning and larger capacity modeling with advanced neural components. We build our end-to-end multilingual ST model based on Transformer, integrating techniques including adaptive speech feature selection, language-specific modeling, multi-task learning, deep and big Transformer, sparsified linear attention and root mean square layer normalization. We adopt data augmentation using machine translation models for ST which converts the zero-shot problem into a zero-resource one. Experimental results show that these methods deliver substantial improvements, surpassing the official baseline by > 15 average BLEU and outperforming our cascading system by > 2 average BLEU. Our final submission achieves competitive performance (runner up).

Highlights

End-to-end (E2E) speech translation (ST) has achieved great success in recent years, outperforming its cascading counterpart and delivering state-of-the-art performance on several benchmarks (Ansari et al, 2020; Zhang et al, 2020a; Zhao et al, 2020), it still suffers from the relatively low amounts of dedicated speech-to-translation parallel training data (Salesky et al, 2021)
Our study demonstrates that rectified linear attention (ReLA) generalizes well to ST
Zhang and Sennrich (2019b) propose root mean square layer normalization (RMSNorm) which relies on root mean square statistic alone to regularize activations and is a drop-in replacement to LayerNorm

Summary

Introduction

End-to-end (E2E) speech translation (ST) has achieved great success in recent years, outperforming its cascading counterpart and delivering state-of-the-art performance on several benchmarks (Ansari et al, 2020; Zhang et al, 2020a; Zhao et al, 2020), it still suffers from the relatively low amounts of dedicated speech-to-translation parallel training data (Salesky et al, 2021). Whether and how to obtain similar success in very low-resource (and practical) scenario for multilingual ST with E2E models remains an open question. To address this question, we participated in the IWSLT2021 multilingual speech translation task, which focuses on low-resource ST language pairs in a multilingual setup. The task is organized in two settings: constrained setting and unconstrained setting The former restricts participants to use the given multilingual TEDx data (Salesky et al, 2021) alone for experiment; while the latter allows for additional ASR/ST/MT/others training data.

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Edinburgh’s End-to-End Multilingual Speech Translation System for IWSLT 2021

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2021
Citations: 1	License type: cc-by

Similar Papers

End-end Speech-to-Text Translation with Modality Agnostic Meta-Learning
Sathish Indurthi ... Houjeung Han
-
Sathish Indurthi, et. al.Sathish Indurthi ... Houjeung Han
01 May 2020
01 May 2020

Distributed speech translation technologies for multiparty multilingual communication
Sakriani Sakti ... Yutaka Ashikari
ACM Transactions on Speech and Language Processing | VOL. 9
Sakriani Sakti, et. al.Sakriani Sakti ... Yutaka Ashikari
01 Jul 2012
ACM Transactions on Speech and Language Processing | VOL. 9

End-to-End Speech Translation With Transcoding by Multi-Task Learning for Distant Language Pairs
Takatomo Kano ... Sakriani Sakti
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28
Takatomo Kano, et. al.Takatomo Kano ... Sakriani Sakti
01 Jan 2020
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28

End-to-End Speech Translation with Adversarial Training
Xuancai Li ... Muyun Yang
-
Xuancai Li, et. al.Xuancai Li ... Muyun Yang
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Edinburgh’s End-to-End Multilingual Speech Translation System for IWSLT 2021

Abstract

Highlights

Summary

Talk to us

Similar Papers