Bilingual recursive neural network based data selection for statistical machine translation

Derek F Wong,Yi Lu,Lidia S Chao

doi:10.1016/j.knosys.2016.05.003

Bilingual recursive neural network based data selection for statistical machine translation

Derek F Wong, Yi Lu + Show 1 more

https://doi.org/10.1016/j.knosys.2016.05.003

Copy DOI

Journal: Knowledge-Based Systems	Publication Date: May 9, 2016
Citations: 14

Affiliation: University of Macau

#Data Selection Approaches #Out-domain Data + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Data selection is a widely used and effective solution to domain adaptation in statistical machine translation (SMT). The dominant methods are perplexity-based ones, which do not consider the mutual translations of sentence pairs and tend to select short sentences. In this paper, to address these problems, we propose bilingual semi-supervised recursive neural network data selection methods to differentiate domain-relevant data from out-domain data. The proposed methods are evaluated in the task of building domain-adapted SMT systems. We present extensive comparisons and show that the proposed methods outperform the state-of-the-art data selection approaches.

Full Text