Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement

Yichao Du,Tong Xu,Zhirui Zhang,Boxing Chen,Weizhi Wang,Jun Xie

doi:10.1609/aaai.v36i10.21303

Abstract

End-to-end speech-to-text translation (E2E-ST) is becoming increasingly popular due to the potential of its less error propagation, lower latency, and fewer parameters. Given the triplet training corpus〈speech, transcription, translation〉, the conventional high-quality E2E-ST system leverages the〈speech, transcription〉pair to pre-train the model and then utilizes the〈speech, translation〉pair to optimize it further. However, this process only involves two-tuple data at each stage, and this loose coupling fails to fully exploit the association between triplet data. In this paper, we attempt to model the joint probability of transcription and translation based on the speech input to directly leverage such triplet data. Based on that, we propose a novel regularization method for model training to improve the agreement of dual-path decomposition within triplet data, which should be equal in theory. To achieve this goal, we introduce two Kullback-Leibler divergence regularization terms into the model training objective to reduce the mismatch between output probabilities of dual-path. Then the well-trained model can be naturally transformed as the E2E-ST models by a pre-defined early stop tag. Experiments on the MuST-C benchmark demonstrate that our proposed approach significantly outperforms state-of-the-art E2E-ST baselines on all 8 language pairs while achieving better performance in the automatic speech recognition task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 28, 2022
Citations: 7

Similar Papers

The use of discrete distributions with a very large codebook for automatic speech recognition and speaker verification
Guoli Ye
-
Guoli YeGuoli Ye
23 Dec 2014
23 Dec 2014

Dynamic boundary detection for speech translation
Nina Zhou ... Aiti Aw
-
Nina Zhou, et. al.Nina Zhou ... Aiti Aw
01 Dec 2017
01 Dec 2017

Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition
Qingyu Wang ... Bo Xu
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37
Qingyu Wang, et. al.Qingyu Wang ... Bo Xu
26 Jun 2023
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37

A Comparison of Modeling Units in Sequence-to-Sequence Speech Recognition with the Transformer on Mandarin Chinese
Shiyu Zhou ... Shuang Xu
-
Shiyu Zhou, et. al.Shiyu Zhou ... Shuang Xu
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence