Transformer-Based Direct Speech-To-Speech Translation with Transcoder

Takatomo Kano,Satoshi Nakamura,Sakriani Sakti

doi:10.1109/slt48900.2021.9383496

Abstract

Traditional speech translation systems use a cascade manner that concatenates speech recognition (ASR), machine translation (MT), and text-to-speech (TTS) synthesis to translate speech from one language to another language in a step-by-step manner. Unfortunately, since those components are trained separately, MT often struggles to handle ASR errors, resulting in unnatural translation results. Recently, one work attempted to construct direct speech translation in a single model. The model used a multi-task scheme that learns to predict not only the target speech spectrograms directly but also the source and target phoneme transcription as auxiliary tasks. However, that work was only evaluated Spanish-English language pairs with similar syntax and word order. With syntactically distant language pairs, speech translation requires distant word order, and thus direct speech frame-to-frame alignments become difficult. Another direction was to construct a single deep-learning framework while keeping the step-by-step translation process. However, such studies focused only on speech-to-text translation. Furthermore, all of these works were based on a recurrent neural net-work (RNN) model. In this work, we propose a step-by-step scheme to a complete end-to-end speech-to-speech translation and propose a Transformer-based speech translation using Transcoder. We compare our proposed and multi-task model using syntactically similar and distant language pairs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Transformer-Based Direct Speech-To-Speech Translation with Transcoder

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

End-to-End Speech Translation With Transcoding by Multi-Task Learning for Distant Language Pairs
Takatomo Kano ... Sakriani Sakti
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28
Takatomo Kano, et. al.Takatomo Kano ... Sakriani Sakti
01 Jan 2020
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28

Unsupervised Neural Machine Translation for Similar and Distant Language Pairs
Haipeng Sun ... Rui Wang
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 20
Haipeng Sun, et. al.Haipeng Sun ... Rui Wang
31 Jan 2021
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 20

Unsupervised Multimodal Machine Translation for Low-resource Distant Language Pairs
Turghun Tayir ... Lin Li
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 23
Turghun Tayir, et. al.Turghun Tayir ... Lin Li
15 Apr 2024
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 23

Structured-Based Curriculum Learning for End-to-End English-Japanese Speech Translation
Takatomo Kano ... Sakriani Sakti
-
Takatomo Kano, et. al.Takatomo Kano ... Sakriani Sakti
20 Aug 2017
20 Aug 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Transformer-Based Direct Speech-To-Speech Translation with Transcoder

Abstract

Talk to us

Similar Papers