Phrase Table Induction Using Monolingual Data for Low-Resource Statistical Machine Translation

Benjamin Marie,Atsushi Fujita

doi:10.1145/3168054

Benjamin Marie, Atsushi Fujita

Open Access

PDF Available

https://doi.org/10.1145/3168054

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

We propose a new method for inducing a phrase-based translation model from a pair of unrelated monolingual corpora. Our method is able to deal with phrases of arbitrary length and to find phrase pairs that are useful for statistical machine translation, without requiring large parallel or comparable corpora. First, our method generates phrase pairs through coupling source and target phrases separately collected from respective monolingual data. Then, for each phrase pair, we compute features using the monolingual data and a small quantity of parallel sentences. Finally, incorrect phrase pairs are pruned, and a phrase table is made using the remaining phrase pairs. In our experiments on French--Japanese and Spanish--Japanese translation tasks under low-resource conditions, we observe that incorporating a phrase table induced by our method to the machine translation system leads to large improvements in translation quality. Furthermore, we show that a phrase table induced by our method can also be useful in a wide range of configurations, including configurations where we have already access to large parallel corpora and configurations where only small monolingual corpora are available.

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Phrase Table Induction Using Monolingual Data for Low-Resource Statistical Machine Translation

Abstract

Published Version (Free)

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Feb 13, 2018
Citations: 9

Similar Papers

Minimum Bayes-Risk Phrase Table Pruning for Pivot-Based Machine Translation in Internet of Things
Xiaoning Zhu ... Muyun Yang
IEEE Access | VOL. 6
Xiaoning Zhu, et. al.Xiaoning Zhu ... Muyun Yang
01 Jan 2018
IEEE Access | VOL. 6

Exploiting Parallel Treebanks to Improve Phrase-Based Statistical Machine Translation
John Tinsley ... Andy Way
-
John Tinsley, et. al.John Tinsley ... Andy Way
01 Jan 2009
01 Jan 2009

Automatically generated parallel treebanks and their exploitability in machine translation
John Tinsley ... Andy Way
Machine Translation | VOL. 23
John Tinsley, et. al.John Tinsley ... Andy Way
01 Feb 2009
Machine Translation | VOL. 23

A Phrase Table Filtering Model Based on Binary Classification for Uyghur-Chinese Machine Translation
...
Journal of Computers | VOL. 9
, et. al. ...
12 Jan 2014
Journal of Computers | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Phrase Table Induction Using Monolingual Data for Low-Resource Statistical Machine Translation

Abstract

Published Version (Free)

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing