A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT

Masaaki Nagata,Masaaki Nishino,Katsuki Chousa

doi:10.18653/v1/2020.emnlp-main.41

Abstract

We present a novel supervised word alignment method based on cross-language span prediction. We first formalize a word alignment problem as a collection of independent predictions from a token in the source sentence to a span in the target sentence. Since this step is equivalent to a SQuAD v2.0 style question answering task, we solve it using the multilingual BERT, which is fine-tuned on manually created gold word alignment data. It is nontrivial to obtain accurate alignment from a set of independently predicted spans. We greatly improved the word alignment accuracy by adding to the question the source token’s context and symmetrizing two directional predictions. In experiments using five word alignment datasets from among Chinese, Japanese, German, Romanian, French, and English, we show that our proposed method significantly outperformed previous supervised and unsupervised word alignment methods without any bitexts for pretraining. For example, we achieved 86.7 F1 score for the Chinese-English data, which is 13.3 points higher than the previous state-of-the-art supervised method.

Highlights

Over the last several years, machine translation accuracy has been greatly improved by neural networks (Cho et al, 2014; Sutskever et al, 2014; Bahdanau et al, 2015; Luong et al, 2015; Vaswani et al, 2017)
Most previous works that use them for word alignment (Yang et al, 2013; Tamura et al, 2014; Legrand et al, 2016) achieved accuracies that are basically comparable to GIZA++
We presented a novel supervised word alignment method using the multilingual BERT, which requires as few as 300 training sentences to outperform previous supervised and unsupervised methods

Summary

Introduction

Over the last several years, machine translation accuracy has been greatly improved by neural networks (Cho et al, 2014; Sutskever et al, 2014; Bahdanau et al, 2015; Luong et al, 2015; Vaswani et al, 2017). Word alignment tools, which were developed during the age of statistical machine translation (Brown et al, 1993; Koehn et al, 2007) such as GIZA++ (Och and Ney, 2003), MGIZA (Gao and Vogel, 2008) and FastAlign (Dyer et al, 2013), remain widely used because the improvement of word alignment accuracy has become stagnant. This situation is unfortunate because word alignment could be used for many downstream tasks including projecting linguistic annotation (Yarowsky et al, 2001), projecting XML markups (Hashimoto et al, 2019), and enforcing terminology constraints (pre-specified translation) (Song et al, 2019). The accuracy of recent works (Garg et al, 2019; Stengel-Eskin et al, 2019; Zenkel et al, 2020) based on the Transformer (Vaswani et al, 2017), which is the state-of-the art neural machine translation model, have started to outperform GIZA++

Methods

Results

Conclusion