UWSpeech: Speech to Speech Translation for Unwritten Languages

Chen Zhang,Xu Tan,Tao Qin,Tie-Yan Liu,Yi Ren,Kejun Zhang

doi:10.1609/aaai.v35i16.17684

Abstract

Existing speech to speech translation systems heavily rely on the text of target language: they usually translate source language either to target text and then synthesize target speech from text, or directly to target speech with target text for auxiliary training. However, those methods cannot be applied to unwritten target languages, which have no written text or phoneme available. In this paper, we develop a translation system for unwritten languages, named as UWSpeech, which converts target unwritten speech into discrete tokens with a converter, and then translates source-language speech into target discrete tokens with a translator, and finally synthesizes target speech from target discrete tokens with an inverter. We propose a method called XL-VAE, which enhances vector quantized variational autoencoder (VQ-VAE) with cross-lingual (XL) speech recognition, to train the converter and inverter of UWSpeech jointly. Experiments on Fisher Spanish-English conversation translation dataset show that UWSpeech outperforms direct translation and VQ-VAE baseline by about 16 and 10 BLEU points respectively, which demonstrate the advantages and potentials of UWSpeech.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

UWSpeech: Speech to Speech Translation for Unwritten Languages

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: May 18, 2021
Citations: 17

Similar Papers

Combining multiple translation systems for Spoken Language Understanding portability
F Garcia ... E Segarra
-
F Garcia, et. al.F Garcia ... E Segarra
01 Dec 2012
01 Dec 2012

Baidu Translate: Research and Products
Zhongjun He
-
Zhongjun HeZhongjun He
01 Jan 2015
01 Jan 2015

Acoustic and lexical resource constrained ASR using language-independent acoustic model and language-dependent probabilistic lexical model
Ramya Rasipuram ... Mathew Magimai-Doss
Speech Communication | VOL. 68
Ramya Rasipuram, et. al.Ramya Rasipuram ... Mathew Magimai-Doss
29 Dec 2015
Speech Communication | VOL. 68

Automatic pronunciation prediction for text-to-speech synthesis of dialectal arabic in a speech-to-speech translation system
Sankaranarayanan Ananthakrishnan ... Prem Natarajan
-
Sankaranarayanan Ananthakrishnan, et. al.Sankaranarayanan Ananthakrishnan ... Prem Natarajan
01 Mar 2012
01 Mar 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

UWSpeech: Speech to Speech Translation for Unwritten Languages

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence