Abstract

One significant resource for language translation using Statistical Machine Translation (SMT) is parallel corpora. SMT model works well with timing aligned parallel corpora. However, imperfectly aligned sentences in the bilingual corpus typically leads to poorer translation in the final translation after training the SMT model. A major challenge in effectively applying nontiming aligned parallel corpora in the SMT model has not been thoroughly researched. The goal of this paper is to improve the accuracy of an English to Thai Statistical Machine Translation (SMT) model by improving the sentence alignment of parallel corpora. This work proposes an improved English-Thai translation framework for non-timing aligned Parallel corpora using an improved alignment algorithm: Bleualign with explicit user feedback. The generated model can then be applied to the Moses SMT training system to generate English-Thai translation. This experiment uses both English and Thai subtitles obtained from TED (www.ted.com) to build the parallel corpora. The TED corpora sentences are not timing aligned, and this research will try to generate an alignment model to be applied on the Moses SMT training system. The result shows that the model using our proposed algorithm outperforms two traditional alignment models: Gale-Church, Bleualign with the highest BLEU score of 0.36.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.