Abstract

Recent studies argue that knowledge distillation is promising for speech translation (ST) using end-to-end models. In this work, we investigate the effect of knowledge distillation with a cascade ST using automatic speech recognition (ASR) and machine translation (MT) models. We distill knowledge from a teacher model based on human transcripts to a student model based on erroneous transcriptions. Our experimental results demonstrated that knowledge distillation is beneficial for a cascade ST. Further investigation that combined knowledge distillation and fine-tuning revealed that the combination consistently improved two language pairs: English-Italian and Spanish-English.

Highlights

  • Speech translation (ST) converts utterances in a source language into text in another language

  • Our work focuses on the application of Knowledge distillation (KD) to a cascade ST using a teacher model based on clean transcripts for the student model that takes erroneous inputs

  • In the cascade ST, the performance of a system trained using only automatic speech recognition (ASR) input (MTasr) was worse (0.3-BLEU drop for the ASRbased test data and 2.5-BLEU drop for the clean test data) than the clean input (MTclean)

Read more

Summary

Introduction

Speech translation (ST) converts utterances in a source language into text in another language. A new ST system called end-to-end or direct ST uses a single model to directly translate the source language speech into target language text (Berard et al, 2016). A naive end-toend ST without additional training, such as ASR tasks, remains inferior to a cascade ST (Liu et al, 2018; Salesky and Black, 2020). It requires parallel data of the source language speech and the target language text, which cannot be obtained in practice

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.