Measuring domain similarity for statistical machine translation

Lin Liu Lin Liu,Hailong Cao Hailong Cao,Tiejun Zhao Tiejun Zhao

doi:10.1109/fskd.2013.6816269

Abstract

It is well known that the statistical machine translation (SMT) performance suffers when a model is applied to out-of-domain data. It is also known that the more similar the test domain and the training domain are, the more efficient the training data are for SMT performance. Hence, measuring the similarity of domains is an important task to select appropriate training data. The most widely used method uses the cosine similarity function and word frequency. The lack of exploring other approaches motivates us to propose and compare several similarity measures. Aiming for better SMT performance, we compared 10 similarity measures, which are a combination of 2 feature representations and 5 similarity functions. The results show that using the relative word frequency as the feature representation and using the skew divergence as the similarity function performs the best amongst the 10 measures and outperforms random data selection.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Measuring domain similarity for statistical machine translation

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Investigating the Relationship between Classification Quality and SMT Performance in Discriminative Reordering Models
Arefeh Kazemi ... Antonio Toral
Entropy | VOL. 19
Arefeh Kazemi, et. al.Arefeh Kazemi ... Antonio Toral
24 Aug 2017
Entropy | VOL. 19

Mining Parallel Resources for Machine Translation from Comparable Corpora
Santanu Pal ... Partha Pakray
-
Santanu Pal, et. al.Santanu Pal ... Partha Pakray
01 Jan 2015
01 Jan 2015

English Translation Model Design Based on Neural Network
Xiangrong Liu
-
Xiangrong LiuXiangrong Liu
31 Jul 2019
31 Jul 2019

Efficient data selection for machine translation
A Mandal ... A Stolcke
-
A Mandal, et. al.A Mandal ... A Stolcke
01 Dec 2008
01 Dec 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Measuring domain similarity for statistical machine translation

Abstract

Talk to us

Similar Papers