Abstract
Syntactic reordering model is proposed on the basis of phrase-based statistical translation model in order to handle and count long-distance reordering in machine translation. In the method, various information obtained from monolingual and bilingual corpus is fully utilized under the maximum entropy mode framework. New collocation translation method is different from previous method with over-reliance on bilingual corpora in that monolingual corpora training translation model can be used. Context information is further introduced on the basis of matching internal information. EM algorithm is adopted to estimate context-based vocabulary translation probability. Meanwhile, syntax tree structure is segmented in the model according to the phrase segmentation, thereby avoiding inconsistency between phrases and syntactic structures. In the model, reordering sequence of some structures in syntax tree can be determined according to phrase alignment and word alignment in phrases. Sub-structure reordering probability is calculated according to reordering probability on each node, which is used as characteristic function of log-linear model. Experimental results of the model is significantly higher than score of classic phrase statistical translation model. The results show that syntactic reordering model is effective aiming at phrase-based statistical machine translation, syntactic knowledge and phrase translation process can be better combined. Experimental results show that the model is better than phrase-based statistical machine translation model in the aspect of translation knowledge generalization ability and translation results.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.