Abstract

We propose a language-independent approach to clean up word alignment errors in an aligned parallel corpus, which are caused by the unsupervised word-align process. In such an aligned corpus, we evaluate the alignment patterns of one-to-many discontinuous words by statistical measures of collocation. The alignment of discontinuous words without strong collocation tendencies will be taken as errors and deleted. We conduct experiments on two-directional Japanese-English and German-English translation tasks. The experiment results show the state-of-the-art word alignment filtered by the proposed approach can lead to a better translation performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call