As we know, cross-lingual word embedding alignment is critically important for reference-free machine translation evaluation, where source texts are directly compared with system translations. In this paper, it is revealed that multilingual knowledge distillation for sentence embedding alignment could achieve cross-lingual word embedding alignment implicitly. A simplified analysis is given to explain the implicit alignment reason. And according to the analysis, it could be deduced that using the last layer embeddings of the distilled student model will have the best alignment effect, which is also validated by the experimental results on the WMT19 datasets. Furthermore, with the assistant of a target-side language model, BERTScore andWord Mover’s Distance using the cross-lingual word embeddings get very competitive results (4 best average scores on 3 types of language directions and ranking first among more than half of all 18 language pairs for the system-level evaluations) in the WMT19’s reference-free machine translation evaluation tasks when the current state-of-the-art (SOTA) metrics are chosen for comparison.
Read full abstract