Sequence-to-Sequence Models Can Directly Translate Foreign Speech

Ron J Weiss,Navdeep Jaitly,Zhifeng Chen,Yonghui Wu,Jan Chorowski

doi:10.21437/interspeech.2017-503

Abstract

We present a recurrent encoder-decoder deep neural network architecture that directly translates speech in one language into text in another. The model does not explicitly transcribe the speech into text in the source language, nor does it require supervision from the ground truth source language transcription during training. We apply a slightly modified sequence-to-sequence with attention architecture that has previously been used for speech recognition and show that it can be repurposed for this more complex task, illustrating the power of attention-based models. A single model trained end-to-end obtains state-of-the-art performance on the Fisher Callhome Spanish-English speech translation task, outperforming a cascade of independently trained sequence-to-sequence speech recognition and machine translation models by 1.8 BLEU points on the Fisher test set. In addition, we find that making use of the training data in both languages by multi-task training sequence-to-sequence speech translation and recognition models with a shared encoder network can improve performance by a further 1.4 BLEU points.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sequence-to-Sequence Models Can Directly Translate Foreign Speech

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Integration of Speech Recognition and Machine Translation in Computer-Assisted Translation
Shahram Khadivi ... Hermann Ney
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 16
Shahram Khadivi, et. al.Shahram Khadivi ... Hermann Ney
01 Nov 2008
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 16

Integration of speech to computer-assisted translation using finite-state automata
Shahram Khadivi ... Hermann Ney
-
Shahram Khadivi, et. al.Shahram Khadivi ... Hermann Ney
01 Jan 2006
01 Jan 2006

Streaming Models for Joint Speech Recognition and Translation
Orion Weller ... Christian Gollan
-
Orion Weller, et. al.Orion Weller ... Christian Gollan
01 Jan 2020
01 Jan 2020

Leveraging Weakly Supervised Data to Improve End-to-end Speech-to-text Translation
Ye Jia ... Yuan Cao
-
Ye Jia, et. al.Ye Jia ... Yuan Cao
01 May 2019
01 May 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sequence-to-Sequence Models Can Directly Translate Foreign Speech

Abstract

Talk to us

Similar Papers