A Maximum-Entropy Segmentation Model for Statistical Machine Translation

Deyi Xiong,Min Zhang,Haizhou Li

doi:10.1109/tasl.2011.2144971

Abstract

Segmentation is of great importance to statistical machine translation. It splits a source sentence into sequences of translatable segments. We propose a maximum-entropy segmentation model to capture desirable phrasal and hierarchical segmentations for statistical machine translation. We present an approach to automatically learning the beginning and ending boundaries of cohesive segments from word-aligned bilingual data without using any additional resources. The learned boundaries are then used to define cohesive segments in both phrasal and hierarchical segmentations. We integrate the segmentation model into phrasal statistical machine translation (SMT) and conduct experiments on the newswire and broadcast news domain to investigate the effectiveness of the proposed segmentation model on a large-scale training data. Our experimental results show that the maximum-entropy segmentation model significantly improves translation quality in terms of BLEU. We further validate that 1) the proposed segmentation model significantly outperforms syntactic constraints which are used in previous work to constrain segmentations; and 2) it is necessary to capture hierarchical segmentations besides phrasal segmentations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Maximum-Entropy Segmentation Model for Statistical Machine Translation

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE Transactions on Audio, Speech, and Language Processing	Publication Date: Nov 1, 2011
Citations: 38

Similar Papers

Seal: Efficient Training Large Scale Statistical Machine Translation Models on Spark
Rong Gu ... Chunfeng Yuan
-
Rong Gu, et. al.Rong Gu ... Chunfeng Yuan
01 Dec 2018
01 Dec 2018

Bi-Text Alignment of Movie Subtitles for English-Arabic Statistical Machine Translation
Fahad Ahmed Al-Obaidli ... Stephen Cox
-
Fahad Ahmed Al-Obaidli, et. al.Fahad Ahmed Al-Obaidli ... Stephen Cox
01 Jan 2015
01 Jan 2015

Modeling Term Translation for Document-informed Machine Translation
Fandong Meng ... Wenbin Jiang
-
Fandong Meng, et. al.Fandong Meng ... Wenbin Jiang
01 Jan 2014
01 Jan 2014

Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language
Nghia-Luan Pham ... Van-Vinh Nguyen
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 36
Nghia-Luan Pham, et. al.Nghia-Luan Pham ... Van-Vinh Nguyen
30 May 2020
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Maximum-Entropy Segmentation Model for Statistical Machine Translation

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech, and Language Processing