Agglutinative languages often have morphologically complex words(MCWs) composed of multiple morphemes arranged in a hierarchical structure, posing significant challenges in translation tasks. We present a novel Knowledge Distillation approach tailored for improving the translation of such languages. Our method involves an encoder, a forward decoder, and two auxiliary decoders: a backward decoder and a morphological decoder. The forward decoder generates target morphemes autoregressively and is augmented by distilling knowledge from the auxiliary decoders. The backward decoder incorporates future context, while the morphological decoder integrates target-side morphological information. We have also designed a reliability estimation method to selectively distill only the reliable knowledge from these auxiliary decoders. Our approach relies on morphological word segmentation. We show that the word segmentation method based on unsupervised morphology learning outperforms the commonly used Byte Pair Encoding method on highly agglutinative languages in translation tasks. Our experiments conducted on English-Tamil, English-Manipuri, and English-Marathi datasets show that our proposed approach achieves significant improvements over strong Transformer-based NMT baselines.
Read full abstract