Improving Word Alignment Using Linguistic Code Switching Data

Fei Huang,Alexander Yates

doi:10.3115/v1/e14-1001

Abstract

Linguist Code Switching (LCS) is a situation where two or more languages show up in the context of a single conversation. For example, in EnglishChinese code switching, there might be a sentence like “· ‚15© ¨ k ‡meeting (We will have a meeting in 15 minutes)”. Traditional machine translation (MT) systems treat LCS data as noise, or just as regular sentences. However, if LCS data is processed intelligently, it can provide a useful signal for training word alignment and MT models. Moreover, LCS data is from non-news sources which can enhance the diversity of training data for MT. In this paper, we first extract constraints from this code switching data and then incorporate them into a word alignment model training procedure. We also show that by using the code switching data, we can jointly train a word alignment model and a language model using cotraining. Our techniques for incorporating LCS data improve by 2.64 in BLEU score over a baseline MT system trained using only standard sentence-aligned corpora.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving Word Alignment Using Linguistic Code Switching Data

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Hybrid Word Alignment
Santanu Pal ... Sudip Kumar Naskar
-
Santanu Pal, et. al.Santanu Pal ... Sudip Kumar Naskar
01 Jan 2015
01 Jan 2015

Co-occurrence Degree Based Word Alignment: A Case Study on Uyghur-Chinese
Chenggang Mi ... Turghun Osman
-
Chenggang Mi, et. al.Chenggang Mi ... Turghun Osman
01 Jan 2014
01 Jan 2014

Induction of latent domains in heterogeneous corpora: a case study of word alignment
Hoang Cuong ... Khalil Sima’An
Machine Translation | VOL. 31
Hoang Cuong, et. al.Hoang Cuong ... Khalil Sima’An
01 Dec 2017
Machine Translation | VOL. 31

Language Modeling for Syntax-Based Machine Translation Using Tree Substitution Grammars
Tong Xiao ... Muhua Zhu
ACM Transactions on Asian Language Information Processing | VOL. 10
Tong Xiao, et. al.Tong Xiao ... Muhua Zhu
01 Dec 2011
ACM Transactions on Asian Language Information Processing | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving Word Alignment Using Linguistic Code Switching Data

Abstract

Talk to us

Similar Papers