KasNAT: Non-autoregressive machine translation for Kashmiri to English using knowledge distillation

Shailashree K Sheshadri,Deepa Gupta

doi:10.3233/jifs-219383

Abstract

Non-Autoregressive Machine Translation (NAT) represents a groundbreaking advancement in Machine Translation, enabling the simultaneous prediction of output tokens and significantly boosting translation speeds compared to traditional auto-regressive (AR) models. Recent NAT models have adeptly balanced translation quality and speed, surpassing their AR counterparts. The widely employed Knowledge Distillation (KD) technique in NAT involves generating training data from pre-trained AR models, enhancing NAT model performance. While KD has consistently proven its empirical effectiveness and substantial accuracy gains in NAT models, its potential within Indic languages has yet to be explored. This study pioneers the evaluation of NAT model performance for Indic languages, focusing mainly on Kashmiri to English translation. Our exploration encompasses varying encoder and decoder layers and fine-tuning hyper-parameters, shedding light on the vital role KD plays in facilitating NAT models to capture variations in output data effectively. Our NAT models, enhanced with KD, exhibit sacreBLEU scores ranging from 16.20 to 22.20. The Insertion Transformer reaches a SacreBLEU of 22.93, approaching AR model performance.

Full Text