Abstract

Syntactic reordering model is proposed on the basis of phrase-based statistical translation model in order to handle and count long-distance reordering in machine translation. In the method, various information obtained from monolingual and bilingual corpus is fully utilized under the maximum entropy mode framework. New collocation translation method is different from previous method with over-reliance on bilingual corpora in that monolingual corpora training translation model can be used. Context information is further introduced on the basis of matching internal information. EM algorithm is adopted to estimate context-based vocabulary translation probability. Meanwhile, syntax tree structure is segmented in the model according to the phrase segmentation, thereby avoiding inconsistency between phrases and syntactic structures. In the model, reordering sequence of some structures in syntax tree can be determined according to phrase alignment and word alignment in phrases. Sub-structure reordering probability is calculated according to reordering probability on each node, which is used as characteristic function of log-linear model. Experimental results of the model is significantly higher than score of classic phrase statistical translation model. The results show that syntactic reordering model is effective aiming at phrase-based statistical machine translation, syntactic knowledge and phrase translation process can be better combined. Experimental results show that the model is better than phrase-based statistical machine translation model in the aspect of translation knowledge generalization ability and translation results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call