A simplification–translation–restoration framework for domain adaptation in statistical machine translation: A case study in medical record translation

Han-Bin Chen,Hen-Hsen Huang,An-Chang Hsieh,Hsin-Hsi Chen

doi:10.1016/j.csl.2016.08.003

Abstract

Integration of in-domain knowledge into an out-of-domain statistical machine translation (SMT) system poses challenges due to the lack of resources. Lack of in-domain bilingual corpora is one such issue. In this paper, we propose a simplification–translation–restoration (STR) framework for domain adaptation in SMT systems. An SMT system to translate medical records from English to Chinese is taken as a case study. We identify the critical segments in a medical sentence and simplify them to alleviate the data sparseness problem in the out-of-domain SMT system. After translating the simplified sentence, the translations of these critical segments are restored to their proper positions. Besides the simplification pre-processing step and the restoration post-processing step, we also enhance the translation and language models in the STR framework by using pseudo bilingual corpora generated by the background MT system. In the experiments, we adapt an SMT system from a government document domain to a medical record domain. The results show the effectiveness of the STR framework.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A simplification–translation–restoration framework for domain adaptation in statistical machine translation: A case study in medical record translation

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: Sep 3, 2016
Citations: 2

Similar Papers

Statistical vs. Rule-Based Machine Translation: A Comparative Study on Indian Languages
S Sreelekha ... Pushpak Bhattacharyya
-
S Sreelekha, et. al.S Sreelekha ... Pushpak Bhattacharyya
28 Dec 2017
28 Dec 2017

Training, Enhancing, Evaluating and Using MT Systems with Comparable Data
Bogdan Babych ... Mārcis Pinnis
-
Bogdan Babych, et. al.Bogdan Babych ... Mārcis Pinnis
01 Jan 2019
01 Jan 2019

Hybrid data-driven models of machine translation
Declan Groves ... Andy Way
Machine Translation | VOL. 19
Declan Groves, et. al.Declan Groves ... Andy Way
02 Nov 2006
Machine Translation | VOL. 19

Using Statistical Machine Translation to Grade Training Data
Andrew Finch ... Eiichiro Sumita
-
Andrew Finch, et. al.Andrew Finch ... Eiichiro Sumita
01 Dec 2008
01 Dec 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A simplification–translation–restoration framework for domain adaptation in statistical machine translation: A case study in medical record translation

Abstract

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language