On Knowledge Distillation for Direct Speech Translation

Marco Gaido,Matteo Negri,Mattia Antonino Di Gangi,Marco Turchi

doi:10.4000/books.aaccademia.8585

Abstract

Direct speech translation (ST) has shown to be a complex task requiring knowledge transfer from its sub-tasks: automatic speech recognition (ASR) and machine translation (MT). For MT, one of the most promising techniques to transfer knowledge is knowledge distillation. In this paper, we compare the different solutions to distill knowledge in a sequence-to-sequence task like ST. Moreover, we analyze eventual drawbacks of this approach and how to alleviate them maintaining the benefits in terms of translation quality.

Highlights

With the increased interest in deep learning in recent years, there has been an explosion of machine learning tools
Prior work has recognized the value of dynamic eager execution for deep learning, and some recent frameworks implement this define-by-run approach, but do so either at the cost of performance (Chainer [5]) or using a less expressive, faster language (Torch [6], DyNet [7]), which limits their applicability
We compare the performance of PyTorch with several other commonly-used deep learning libraries, and find that it achieves competitive performance across a range of tasks

Summary

Introduction

With the increased interest in deep learning in recent years, there has been an explosion of machine learning tools Many popular frameworks such as Caffe [1], CNTK [2], TensorFlow [3], and Theano [4], construct a static dataflow graph that represents the computation and which can be applied repeatedly to batches of data. Starting in the 1960s, the development of domain specific languages such as APL [8], MATLAB [9], R [10] and Julia [11], turned multidimensional arrays (often referred to as tensors) into first-class objects supported by a comprehensive set of mathematical primitives (or operators) to manipulate them Libraries such as NumPy[12], Torch[6], Eigen[13] and Lush[14] made array-based programming productive in general purpose languages such as Python, Lisp, C++ and Lua. Second, the development of automatic differentiation [15] made it possible to fully automate the daunting labor of computing derivatives. The autograd [16] package popularized the use of this technique for NumPy arrays, and similar approaches are used in frameworks such as Chainer [5], DyNet [7], Lush [14], Torch [6], Jax [17] and Flux.jl [18]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On Knowledge Distillation for Direct Speech Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 4	License type: other-oa

Similar Papers

Cascaded Models with Cyclic Feedback for Direct Speech Translation
Tsz Kin Lam ... Shigehiko Schamoni
-
Tsz Kin Lam, et. al.Tsz Kin Lam ... Shigehiko Schamoni
06 Jun 2021
06 Jun 2021

Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders
...
-
, et. al. ...
01 Aug 2021
01 Aug 2021

On Knowledge Distillation for Translating Erroneous Speech Transcriptions
Ryo Fukuda ... Katsuhito Sudoh
-
Ryo Fukuda, et. al.Ryo Fukuda ... Katsuhito Sudoh
01 Jan 2020
01 Jan 2020

Direct Speech-to-Text Translation Models as Students of Text-to-Text Models
Marco Gaido ... Matteo Negri
Italian Journal of Computational Linguistics | VOL. 8
Marco Gaido, et. al.Marco Gaido ... Matteo Negri
01 Jul 2022
Italian Journal of Computational Linguistics | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On Knowledge Distillation for Direct Speech Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers