Translation Quality Estimation Using Only Bilingual Corpora

Lemao Liu,Eiichiro Sumita,Andrew Finch,Atsushi Fujita,Masao Utiyama

doi:10.1109/taslp.2017.2716195

Abstract

In computer-aided translation scenarios, quality estimation of machine translation hypotheses plays a critical role. Existing methods for word-level translation quality estimation (TQE) rely on the availability of manually annotated TQE training data obtained via direct annotation or postediting. However, due to the cost of human labor, such data are either limited in size or is only available for few tasks in practice. To avoid the reliance on such annotated TQE data, this paper proposes an approach to train word-level TQE models using bilingual corpora, which are typically used in machine translation training and is relatively easier to access. We formalize the training of our proposed method under the framework of maximum marginal likelihood estimation. To avoid degenerated solutions, we propose a novel regularized training objective whose optimization is achieved by an efficient approximation. Extensive experiments on both written and spoken language datasets empirically show that our approach yields comparable performance to the standard training on annotated data.

Full Text