Abstract

Shan is said to be the second-largest ethnic group of Myanmar. The main motivation is to break down the communication barrier between Shan people and Myanmar people. This paper contributes to the first evaluation of the quality of machine translation between Myanmar (Burmese) and Shan (Tai Long). We also built a Myanmar-Shan parallel corpus (around 11K sentences) based on the Myanmar language of the ASEAN MT corpus. In this research, three different statistical machine translation approaches were used to carry out the experiment: phrase-based, hierarchical phrase-based, and the operation sequence model. Furthermore, two different segmentation schemes were studied, these were syllable segmentation and word segmentation. Translating with syllable segmentation achieved higher quality machine translation for both Myanmar and Shan languages. BLEU and RIBES scoring techniques are used to measure the performance of the machine translations. The operation sequence model gave the highest scores (41.85 BLEU and 0.88031 RIBES) for Shan to Myanmar syllable translation. For Myanmar to Shan syllable translation, hierarchical phrase-based machine translation gave the highest BLEU score of 34.72 and the operation sequence model gave the highest RIBES score of 0.87012. Our experimental results with syllable segmentation produced promising results even with low data resources and we expect this can be developed into a useful translation system as more data comes available in the future.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call