Syntactically Lexicalized Phrase-Based SMT

H Hassan,A Way,K Sima'An

doi:10.1109/tasl.2008.925870

Abstract

Until quite recently, extending phrase-based statistical machine translation (PBSMT) with syntactic knowledge caused system performance to deteriorate. The most recent successful enrichments of PBSMT with hierarchical structure either employ nonlinguistically motivated syntax for capturing hierarchical reordering phenomena, or extend the phrase translation table with redundantly ambiguous syntactic structures over phrase pairs. In this paper, we present an extended, harmonized account of our previous work which showed that incorporating linguistically motivated lexical syntactic descriptions, called <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">supertags</i> , can yield significantly better PBSMT systems at insignificant extra computational cost. We describe a novel PBSMT model that integrates supertags into the target language model and the target side of the translation model. Two kinds of supertags are employed: those from lexicalized tree-adjoining grammar and combinatory categorial grammar. Despite the differences between the two sets of supertags, they give similar improvements. In addition to integrating the Markov supertagging approach in PBSMT, we explore the utility of a new surface grammaticality measure based on combinatory operators. We perform various experiments on the Arabic-to-English NIST 2005 test set addressing the issues of sparseness, scalability, and the utility of system subcomponents. We show that even when the parallel training data grows very large, the supertagged system retains a relatively stable absolute performance advantage over the unadorned PBSMT system. Arguably, this hints at a performance gap that cannot be bridged by acquiring more phrase pairs. Our best result shows a relative improvement of 6.1% over a state-of-the-art PBSMT model, which compares favorably with the leading systems on the NIST 2005 task. We also demonstrate that the advantages of a supertag-based system carry over to German-English, where improvements of up to 8.9% relative to the baseline system are observed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Syntactically Lexicalized Phrase-Based SMT

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE Transactions on Audio, Speech, and Language Processing	Publication Date: Sep 1, 2008
Citations: 68

Similar Papers

Sentence Similarity-Based Source Context Modelling in PBSMT
Rejwanul Haque ... Marta R Costa-Jussa
-
Rejwanul Haque, et. al.Rejwanul Haque ... Marta R Costa-Jussa
01 Dec 2010
01 Dec 2010

Toward Building a Comprehensive Phrase-based English-Arabic Statistical Machine Translation System
Sara Ebrahim ... Doaa Hegazy
The Egyptian Journal of Language Engineering | VOL. 4
Sara Ebrahim, et. al.Sara Ebrahim ... Doaa Hegazy
15 Sep 2017
The Egyptian Journal of Language Engineering | VOL. 4

Lexical syntax for Arabic SMT
Hany Hassan
-
Hany HassanHany Hassan
01 Jan 2012
01 Jan 2012

Incorporating syntax-based language models in phrase-based SMT models
Yidong Chen ... Qingyang Hong
-
Yidong Chen, et. al.Yidong Chen ... Qingyang Hong
01 Nov 2008
01 Nov 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Syntactically Lexicalized Phrase-Based SMT

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech, and Language Processing