Abstract

In computer-aided translation scenarios, quality estimation of machine translation hypotheses plays a critical role. Existing methods for word-level translation quality estimation (TQE) rely on the availability of manually annotated TQE training data obtained via direct annotation or postediting. However, due to the cost of human labor, such data are either limited in size or is only available for few tasks in practice. To avoid the reliance on such annotated TQE data, this paper proposes an approach to train word-level TQE models using bilingual corpora, which are typically used in machine translation training and is relatively easier to access. We formalize the training of our proposed method under the framework of maximum marginal likelihood estimation. To avoid degenerated solutions, we propose a novel regularized training objective whose optimization is achieved by an efficient approximation. Extensive experiments on both written and spoken language datasets empirically show that our approach yields comparable performance to the standard training on annotated data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.