Abstract

Mongolian-Chinese statistical machine translation (SMT) system has its limitation because of the significant syntax differences, complex Mongolian morphology and scarce resource of parallel corpus. Template-based machine translation (TBMT) can produce high accuracy and good syntax structure translations with a relative small corpus. Therefore, SMT systems can combine with TBMT system to get better translations. We built a Mongolian-Chinese TBMT system including a template extraction model and a template translation model. We proposed a novel method of aligning and abstracting static words from bilingual parallel examples to extracts templates automatically. We also proposed a method to filter out low quality TBMT translations to enhance the combined systems. Moreover, we applied lemmatization and latinization to address the problem data sparsity and fuzzy match. Experimentally, the translation of TBMT outperformed the baselines of phrase-based SMT system and hierarchical phrase-based SMT system. The combined system of the TBMT and the SMT systems also performed better than the baselines. Besides, the coverage can satisfy the combined systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call