Sequence-to-Sequence Models for Emphasis Speech Translation

Quoc Truong Do,Sakriani Sakti,Satoshi Nakamura

doi:10.1109/taslp.2018.2846402

Abstract

Speech-to-speech translation S2ST systems are capable of breaking language barriers in cross-lingual communication by translating speech across languages. Recent studies have introduced many improvements that allow existing S2ST systems to handle not only linguistic meaning but also paralinguistic information such as emphasis by proposing additional emphasis estimation and translation components. However, the approach used for emphasis translation is not optimal for sequence translation tasks and fails to easily handle the long-term dependencies of words and emphasis levels. It also requires the quantization of emphasis levels and treats them as discrete labels instead of continuous values. Moreover, the whole translation pipeline is fairly complex and slow because all components are trained separately without joint optimization. In this paper, we make two contributions: 1 we propose an approach that can handle continuous emphasis levels based on sequence-to-sequence models, and 2 we combine machine and emphasis translation into a single model, which greatly simplifies the translation pipeline and make it easier to perform joint optimization. Our results on an emphasis translation task indicate that our translation models outperform previous models by a large margin in both objective and subjective tests. Experiments on a joint translation model also show that our models can perform joint translation of words and emphasis with one-word delays instead of full-sentence delays while preserving the translation performance of both tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sequence-to-Sequence Models for Emphasis Speech Translation

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Oct 1, 2018
Citations: 10

Similar Papers

Preserving Word-Level Emphasis in Speech-to-Speech Translation
Quoc Truong Do ... Tomoki Toda
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 25
Quoc Truong Do, et. al.Quoc Truong Do ... Tomoki Toda
01 Mar 2017
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 25

Generalizing continuous-space translation of paralinguistic information
Takatomo Kano ... Graham Neubig
-
Takatomo Kano, et. al.Takatomo Kano ... Graham Neubig
25 Aug 2013
25 Aug 2013

Object-array structure, frames of reference, and retrieval of spatial knowledge.
Randolph D Easton ... M Jeanne Sholl
Journal of experimental psychology. Learning, memory, and cognition | VOL. 21
Randolph D Easton, et. al.Randolph D Easton ... M Jeanne Sholl
01 Jan 1995
Journal of experimental psychology. Learning, memory, and cognition | VOL. 21

Towards integrated machine translation using structural alignment from syntax-augmented synchronous parsing
Bing Xiang ... Martin Cmejrek
-
Bing Xiang, et. al.Bing Xiang ... Martin Cmejrek
01 Dec 2009
01 Dec 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sequence-to-Sequence Models for Emphasis Speech Translation

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing