Translation of Untranslatable Words-Integration of Lexical Approximation and Phrase-Table Extension Techniques into Statistical Machine Translation

Michael Paul,Eiichiro Sumita,Karunesh Arora

doi:10.1587/transinf.e92.d.2378

Abstract

This paper proposes a method for handling out-of-vocabulary (OOV) words that cannot be translated using conventional phrase-based statistical machine translation (SMT) systems. For a given OOV word, lexical approximation techniques are utilized to identify spelling and inflectional word variants that occur in the training data. All OOV words in the source sentence are then replaced with appropriate word variants found in the training corpus, thus reducing the number of OOV words in the input. Moreover, in order to increase the coverage of such word translations, the SMT translation model is extended by adding new phrase translations for all source language words that do not have a single-word entry in the original phrase-table but only appear in the context of larger phrases. The effectiveness of the proposed methods is investigated for the translation of Hindi to English, Chinese, and Japanese.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEICE Transactions on Information and Systems	Publication Date: Jan 1, 2009
Citations: 1	License type: free

R Discovery Prime

R Discovery Prime

Translation of Untranslatable Words-Integration of Lexical Approximation and Phrase-Table Extension Techniques into Statistical Machine Translation

Abstract

Talk to us

Similar Papers

More From: IEICE Transactions on Information and Systems

Lead the way for us

Similar Papers

Using Statistical Machine Translation to Grade Training Data
Andrew Finch ... Eiichiro Sumita
-
Andrew Finch, et. al.Andrew Finch ... Eiichiro Sumita
01 Dec 2008
01 Dec 2008

Towards incorporating language morphology into statistical machine translation systems
P Karageorgakis ... A Potamianos
-
P Karageorgakis, et. al.P Karageorgakis ... A Potamianos
01 Jan 2004
01 Jan 2004

Hybrid data-driven models of machine translation
Declan Groves ... Andy Way
Machine Translation | VOL. 19
Declan Groves, et. al.Declan Groves ... Andy Way
02 Nov 2006
Machine Translation | VOL. 19

Training, Enhancing, Evaluating and Using MT Systems with Comparable Data
Bogdan Babych ... Mārcis Pinnis
-
Bogdan Babych, et. al.Bogdan Babych ... Mārcis Pinnis
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Translation of Untranslatable Words-Integration of Lexical Approximation and Phrase-Table Extension Techniques into Statistical Machine Translation

Abstract

Talk to us

Similar Papers

More From: IEICE Transactions on Information and Systems