Abstract
Recent studies argue that knowledge distillation is promising for speech translation (ST) using end-to-end models. In this work, we investigate the effect of knowledge distillation with a cascade ST using automatic speech recognition (ASR) and machine translation (MT) models. We distill knowledge from a teacher model based on human transcripts to a student model based on erroneous transcriptions. Our experimental results demonstrated that knowledge distillation is beneficial for a cascade ST. Further investigation that combined knowledge distillation and fine-tuning revealed that the combination consistently improved two language pairs: English-Italian and Spanish-English.
Highlights
Speech translation (ST) converts utterances in a source language into text in another language
Our work focuses on the application of Knowledge distillation (KD) to a cascade ST using a teacher model based on clean transcripts for the student model that takes erroneous inputs
In the cascade ST, the performance of a system trained using only automatic speech recognition (ASR) input (MTasr) was worse (0.3-BLEU drop for the ASRbased test data and 2.5-BLEU drop for the clean test data) than the clean input (MTclean)
Summary
Speech translation (ST) converts utterances in a source language into text in another language. A new ST system called end-to-end or direct ST uses a single model to directly translate the source language speech into target language text (Berard et al, 2016). A naive end-toend ST without additional training, such as ASR tasks, remains inferior to a cascade ST (Liu et al, 2018; Salesky and Black, 2020). It requires parallel data of the source language speech and the target language text, which cannot be obtained in practice
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.