Abstract

Translation quality estimation (QE) is a task of estimating the quality of translation output from an unknown machine translation (MT) system without reference at various granularity (sentence/word/phrase) levels, and it has been attracting much attention due to the potential to reduce post-editing human effort. However, QE suffers heavily from the fact that the quality annotation data remain expensive and small. In this paper, we focus on the limited QE data problem and seek to find how to utilize the high level latent features learned by the pre-trained language models for improving QE. Specifically, we explore three strategies to integrate the pre-trained language representations into QE models: (1) a mixed integration model, where the pre-trained language features are mixed with other features for QE; (2) a direct integration model, which regards the pre-trained language model as the only feature extracting component of the entire QE model; and (3) a constrained integration model, where a constraint mechanism is added to optimize the quality prediction based on the direct integration model. Experiments and analysis presented in this paper demonstrate the effectiveness of our approaches on QE task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.