Target Word Selection Using WordNet and Data-Driven Models in Machine Translation

Yuseop Kim,Jeong-Ho Chang,Byoung-Tak Zhang

doi:10.1007/3-540-45683-x_77

Abstract

Collocation information plays an important role in target word selection of machine translation. However, a collocation dictionary fulfills only a limited portion of selection operation because of data sparseness. To resolve the sparseness problem, we proposed a new methodology that selects target words after determining an appropriate collocation class by using a inter-word semantic similarity. We estimate the similarity by computing semantic distance of two synsets in Word-Net and term-to-term similarity in data-driven models. In WordNet, semantic similarity between two word can be calculated by adapting a reciprocal of the Semantic Distance (SD). For the calculation of the SD, each synset in WordNet is assigned an M- value. The M- value is computed as follows: M- value = \( \tfrac{{radix}} {{sf^p }} \) , where radix is an initial M- value, sf is a scale factor, and p is the number of edges from the root to the synset. As the data-driven models, we utilize Latent Semantic Analysis (LSA) and Probabilistic Latent Semantic Analysis(PLSA), a probabilistic application of LSA. LSA applies singular value decomposition (SVD) to the matrix. SVD is a form of factor analysis and is defined as A = UΣVT, where Σ is a diagonal matrix composed of nonzero eigen values of AAT or A T A, and U and V are the orthogonal eigenvectors associated with the r nonzero eigenvalues of AA T and A T A, respectively. The term-to-term similarity is based on the inner products between two row vectors of A, AA T = UΣ2 U T. And To compute the similarity of w1 and w2 in PLSA, P(z∣w1)P(z∣w2) should be approximately computed with being derived from P(z|w) = P(z)P(w|z)//gSz P(z)P(w|z), where z represents contexts.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Target Word Selection Using WordNet and Data-Driven Models in Machine Translation

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

영한 기계 번역에서 미가공 텍스트 데이터를 이용한 대역어 선택 중의성 해소
... Jeong-Ho Chang
The KIPS Transactions:PartB | VOL. 11B
, et. al. ... Jeong-Ho Chang
01 Oct 2004
The KIPS Transactions:PartB | VOL. 11B

A comparative evaluation of data-driven models in translation selection of machine translation
Yu-Seop Kim ... Byoung-Tak Zhang
-
Yu-Seop Kim, et. al.Yu-Seop Kim ... Byoung-Tak Zhang
01 Jan 2002
01 Jan 2002

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition
Yu-Seop Kim ... Jeong-Ho Chang
-
Yu-Seop Kim, et. al.Yu-Seop Kim ... Jeong-Ho Chang
01 Jan 2003
01 Jan 2003

Comparing the Performance of Latent Semantic Analysis and Probability Latent Semantic Analysis Models on Autoscoring Essay Tasks
Xiaohua Ke ... Haijiao Luo
-
Xiaohua Ke, et. al.Xiaohua Ke ... Haijiao Luo
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Target Word Selection Using WordNet and Data-Driven Models in Machine Translation

Abstract

Talk to us

Similar Papers