End-to-end Feedback Loss in Speech Chain Framework via Straight-through Estimator

Andros Tjandra,Satoshi Nakamura,Sakriani Sakti

doi:10.1109/icassp.2019.8683480

Abstract

The speech chain mechanism integrates automatic speech recognition (ASR) and text-to-speech synthesis (TTS) modules into a single cycle during training. In our previous work, we applied a speech chain mechanism as a semi-supervised learning. It provides the ability for ASR and TTS to assist each other when they receive unpaired data and let them infer the missing pair and optimize the model with reconstruction loss. If we only have speech without transcription, ASR generates the most likely transcription from the speech data, and then TTS uses the generated transcription to reconstruct the original speech features. However, in previous papers, we just limited our back-propagation to the closest module, which is the TTS part. One reason is that back-propagating the error through the ASR is challenging due to the output of the ASR are discrete tokens, creating non-differentiability between the TTS and ASR. In this paper, we address this problem and describe how to thoroughly train a speech chain end-to-end for reconstruction loss using a straight-through estimator (ST). Experimental results revealed that, with sampling from ST-Gumbel-Softmax, we were able to update ASR parameters and improve the ASR performances by 11\% relative CER reduction compared to the baseline.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

End-to-end Feedback Loss in Speech Chain Framework via Straight-through Estimator

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Listening While Speaking and Visualizing: Improving ASR Through Multimodal Chain
Johanes Effendi ... Sakriani Sakti
-
Johanes Effendi, et. al.Johanes Effendi ... Sakriani Sakti
01 Dec 2019
01 Dec 2019

Augmenting Images for ASR and TTS Through Single-Loop and Dual-Loop Multimodal Chain Framework
Johanes Effendi ... Satoshi Nakamura
-
Johanes Effendi, et. al.Johanes Effendi ... Satoshi Nakamura
25 Oct 2020
25 Oct 2020

Speech Chain for Semi-Supervised Learning of Japanese-English Code-Switching ASR and TTS
Sahoko Nakayama ... Satoshi Nakamura
-
Sahoko Nakayama, et. al.Sahoko Nakayama ... Satoshi Nakamura
01 Dec 2018
01 Dec 2018

Machine Speech Chain with One-shot Speaker Adaptation
Andros Tjandra ... Satoshi Nakamura
-
Andros Tjandra, et. al.Andros Tjandra ... Satoshi Nakamura
02 Sep 2018
02 Sep 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

End-to-end Feedback Loss in Speech Chain Framework via Straight-through Estimator

Abstract

Talk to us

Similar Papers