Abstract
Knowledge Distillation (KD) aims to distill the knowledge of a cumbersome teacher model into a lightweight student model. Its success is generally attributed to the privileged information on similarities among categories provided by the teacher model, and in this sense, only strong teacher models are deployed to teach weaker students in practice. However, in low-resource neural machine translation, a stronger teacher model is not available. To counteract this, We therefore propose a novel Teacher-free Knowledge Distillation framework for low-resource neural machine translation, where the model learns from manually designed regularization distribution as a virtual teacher model. The prior distribution of artificial design can not only obtain the similarity information between words, but also provide effective regularity for model training. Experimental results show that the proposed method has improved performance in low-resource language effectively.
Highlights
Knowledge Distillation aims to transfer knowledge from one neural network to another
It is commonly believed that the soft targets of the teacher model can transfer ‘‘dark knowledge’’ containing privileged information on similarity among different categories to enhance the student model [2]–[4]
Data-hungry and learn poorly from low-count events. This behavior makes vanilla Neural machine translation (NMT) a poor choice for low-resource languages, where parallel data is scarce. It limits the further application of neural machine translation models [11]
Summary
Knowledge Distillation aims to transfer knowledge from one neural network (teacher) to another (student). Neural machine translation (NMT) is a deep learning based method [6]–[8] for translation that has recently shown promising results on many language pairs [9]. Data-hungry and learn poorly from low-count events This behavior makes vanilla NMT a poor choice for low-resource languages, where parallel data is scarce. It limits the further application of neural machine translation models [11]. On the relatively low-resource language pairs of UyghurChinese and Mongol-Chinese, experimental results show that the proposed method makes effectively use the similarity information in soft tagets and significantly improves the translation quality
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.