A comparative evaluation of data-driven models in translation selection of machine translation

Yu-Seop Kim,Jeong-Ho Chang,Byoung-Tak Zhang

doi:10.3115/1072228.1072300

Abstract

We present a comparative evaluation of two data-driven models used in translation selection of English-Korean machine translation. Latent semantic analysis(LSA) and probabilistic latent semantic analysis (PLSA) are applied for the purpose of implementation of data-driven models in particular. These models are able to represent complex semantic structures of given contexts, like text passages. Grammatical relationships, stored in dictionaries, are utilized in translation selection essentially. We have used k-nearest neighbor (k-NN) learning to select an appropriate translation of the unseen instances in the dictionary. The distance of instances in k-NN is computed by estimating the similarity measured by LSA and PLSA. For experiments, we used TREC data(AP news in 1988) for constructing latent semantic spaces of two models and Wall Street Journal corpus for evaluating the translation accuracy in each model. PLSA selected relatively more accurate translations than LSA in the experiment, irrespective of the value of k and the types of grammatical relationship.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A comparative evaluation of data-driven models in translation selection of machine translation

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

영한 기계 번역에서 미가공 텍스트 데이터를 이용한 대역어 선택 중의성 해소
... Jeong-Ho Chang
The KIPS Transactions:PartB | VOL. 11B
, et. al. ... Jeong-Ho Chang
01 Oct 2004
The KIPS Transactions:PartB | VOL. 11B

Target Word Selection Using WordNet and Data-Driven Models in Machine Translation
Yuseop Kim ... Byoung-Tak Zhang
-
Yuseop Kim, et. al.Yuseop Kim ... Byoung-Tak Zhang
01 Jan 2002
01 Jan 2002

Comparing the Performance of Latent Semantic Analysis and Probability Latent Semantic Analysis Models on Autoscoring Essay Tasks
Xiaohua Ke ... Haijiao Luo
-
Xiaohua Ke, et. al.Xiaohua Ke ... Haijiao Luo
01 Jan 2017
01 Jan 2017

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition
Yu-Seop Kim ... Jeong-Ho Chang
-
Yu-Seop Kim, et. al.Yu-Seop Kim ... Jeong-Ho Chang
01 Jan 2003
01 Jan 2003

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A comparative evaluation of data-driven models in translation selection of machine translation

Abstract

Talk to us

Similar Papers