Abstract

Most current quality estimation (QE) models for machine translation are trained and evaluated in a fully supervised setting requiring significant quantities of labelled training data. However, obtaining labelled data can be both expensive and time-consuming. In addition, the test data that a deployed QE model would be exposed to may differ from its training data in significant ways. In particular, training samples are often labelled by one or a small set of annotators, whose perceptions of translation quality and needs may differ substantially from those of end-users, who will employ predictions in practice. Thus, it is desirable to be able to adapt QE models efficiently to new user data with limited supervision data. To address these challenges, we propose a Bayesian meta-learning approach for adapting QE models to the needs and preferences of each user with limited supervision. To enhance performance, we further propose an extension to a state-of-the-art Bayesian meta-learning approach which utilizes a matrix-valued kernel for Bayesian meta-learning of quality estimation. Experiments on data with varying number of users and language characteristics demonstrates that the proposed Bayesian meta-learning approach delivers improved predictive performance in both limited and full supervision settings.

Highlights

  • Quality Estimation (QE) models aim to evaluate the output of Machine Translation (MT) systems at run-time, when no reference translations are available (Blatz et al, 2004; Specia et al, 2009)

  • We further improve the performance of Bayesian meta-learning for the task of quality estimation by extending the state-of-the-art Bayesian Model-Agnostic MetaLearning (BMAML) approach of Kim et al (2018) to utilize Stein Variational Gradient Descent (Liu and Wang, 2016) with matrix-valued kernels (Wang et al, 2019), and demonstrate that this leads to enhanced predictive performance in both limited and full supervision settings

  • In this work we propose to improve the predictive performance of BMAML for quality estimation with the use of the Matrix-Stein Variational Gradient Descent (SVGD), which uses matrix-valued kernels for more effective parameter updates, in place of the original SVGD algorithm parameters are initialized from the model’s parameters, and updated with K steps of Matrix-SVGD (using Equations (2) and (4) to (7))

Read more

Summary

Introduction

Quality Estimation (QE) models aim to evaluate the output of Machine Translation (MT) systems at run-time, when no reference translations are available (Blatz et al, 2004; Specia et al, 2009). The perception of the quality of MT output can be subjective, and the quality estimates obtained from a model trained on data from one set of users may not serve the needs of a different set of users. Most existing QE models are trained and evaluated in a fully supervised setting which assumes access to substantial quantities of labelled supervision data, which may not be available and can be expensive and time-consuming to obtain. We further improve the performance of Bayesian meta-learning for the task of quality estimation by extending the state-of-the-art Bayesian Model-Agnostic MetaLearning (BMAML) approach of Kim et al (2018) to utilize Stein Variational Gradient Descent (Liu and Wang, 2016) with matrix-valued kernels (Wang et al, 2019), and demonstrate that this leads to enhanced predictive performance in both limited and full supervision settings

Model-Agnostic Meta-Learning
Stein Variational Gradient Descent
Stein Variational Gradient Descent with Matrix-Valued Kernels
Bayesian Model-Agnostic Meta-Learning
6: Sample DTvail from Tival
QE Model
Limited Supervision Results
Conclusions
Full Supervision Results
A Additional Experimental Details
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call