Knowledge Distillation: A Method for Making Neural Machine Translation More Efficient

Wandri Jooste,Rejwanul Haque,Andy Way

doi:10.3390/info13020088

Wandri Jooste, Rejwanul Haque + Show 1 more

Open Access

https://doi.org/10.3390/info13020088

Copy DOI

Journal: Information	Publication Date: Feb 14, 2022
Citations: 9	License type: CC BY 4.0

Affiliation: Dublin City University, National College of Ireland

Abstract

Neural machine translation (NMT) systems have greatly improved the quality available from machine translation (MT) compared to statistical machine translation (SMT) systems. However, these state-of-the-art NMT models need much more computing power and data than SMT models, a requirement that is unsustainable in the long run and of very limited benefit in low-resource scenarios. To some extent, model compression—more specifically state-of-the-art knowledge distillation techniques—can remedy this. In this work, we investigate knowledge distillation on a simulated low-resource German-to-English translation task. We show that sequence-level knowledge distillation can be used to train small student models on knowledge distilled from large teacher models. Part of this work examines the influence of hyperparameter tuning on model performance when lowering the number of Transformer heads or limiting the vocabulary size. Interestingly, the accuracy of these student models is higher than that of the teachers in some cases even though the student model training times are shorter in some cases. In a novel contribution, we demonstrate for a specific MT service provider that in the post-deployment phase, distilled student models can reduce emissions, as well as cost purely in monetary terms, by almost 50%.

Highlights

Translation More Efficient.Deep neural networks (DNN) underpin state-of-the-art applications of artificial intelligence (AI) in almost all fields, such as image, speech and natural language processing (NLP)
We use sequence-level knowledge distillation and show that small student models can outperform large teacher models; We show that small student models prove to be very useful in the case where machine translation (MT) models need to be deployed in environments where constraining the available hardware is important; We demonstrate a translation industry scenario where knowledge distillation in Neural machine translation (NMT)
Current provider, we focus on three parameters of translation projects which are of crucial importance in industrial settings, namely translation time, translation cost, and carbon emissions, and demonstrate that savings of almost 50% can be achieved; As our investigation focuses on the performance evaluation of small and large NMT

Summary

Introduction

Translation More Efficient.Deep neural networks (DNN) underpin state-of-the-art applications of artificial intelligence (AI) in almost all fields, such as image, speech and natural language processing (NLP). DNN architectures [1] are often data-, compute-, space-, power- and energy-hungry, typically requiring powerful graphic processing units (GPUs) or large-scale clusters to train and deploy, which has been viewed as a “non-green” technology [2]. Work Programme for 2021–2022 adopted on 15 June 2021, the European Commission has committed to making Europe the world’s first climate-neutral continent by 2050. If this important goal is to be achieved, more efficient AI models have to play their part in helping to reduce the amounts of energy that are required for data storage and algorithm training.

Objectives

Methods

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Knowledge Distillation: A Method for Making Neural Machine Translation More Efficient

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

Baidu Translate: Research and Products
Zhongjun He
-
Zhongjun HeZhongjun He
01 Jan 2015
01 Jan 2015

Multilingual Neural Translation

-

14 Feb 2020
14 Feb 2020

Low Resource Neural Machine Translation: Assamese to/from Other Indo-Aryan (Indic) Languages
Rupjyoti Baruah ... Rajesh Kumar Mundotiya
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 21
Rupjyoti Baruah, et. al.Rupjyoti Baruah ... Rajesh Kumar Mundotiya
16 Nov 2021
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 21

Iterative Training of Unsupervised Neural and Statistical Machine Translation Systems
Benjamin Marie ... Atsushi Fujita
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19
Benjamin Marie, et. al.Benjamin Marie ... Atsushi Fujita
01 Jun 2020
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Knowledge Distillation: A Method for Making Neural Machine Translation More Efficient

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information