Abstract

This research was conducted to solve the out-of-vocabulary problem caused by Uyghur spelling errors in Uyghur–Chinese machine translation, so as to improve the quality of Uyghur–Chinese machine translation. This paper assesses three spelling correction methods based on machine translation: 1. Using a Bilingual Evaluation Understudy (BLEU) score; 2. Using a Chinese language model; 3. Using a bilingual language model. The best results were achieved in both the spelling correction task and the machine translation task by using the BLEU score for spelling correction. A maximum F1 score of 0.72 was reached for spelling correction, and the translation result increased the BLEU score by 1.97 points, relative to the baseline system. However, the method of using a BLEU score for spelling correction requires the support of a bilingual parallel corpus, which is a supervised method that can be used in corpus pre-processing. Unsupervised spelling correction can be performed by using either a Chinese language model or a bilingual language model. These two methods can be easily extended to other languages, such as Arabic.

Highlights

  • Whether it is the traditional statistical machine translation (SMT), or the recent research focus on neural machine translation (NMT), out-of-vocabulary (OOV) has always been a problem affecting translation

  • In order to improve the accuracy of natural language processing tasks such as speech recognition [5] and machine translation, a great deal of in-depth research on spelling correction has been undertaken

  • In order to improve the above problem, we propose a method of spelling error correction using the bilingual language model

Read more

Summary

Introduction

Whether it is the traditional statistical machine translation (SMT), or the recent research focus on neural machine translation (NMT), out-of-vocabulary (OOV) has always been a problem affecting translation. Damerau proposed a rule-based method which used dictionary matching to check and correct spelling errors [6]. There are two main types of spelling errors: non-word and real-word errors. Non-word error is the result of a spelling error where the word itself is not in the dictionary and is not a known word. Mistakenly spelling “apple” into “appll” is a non-word error because “appll”. Real-word error is due to misspelling a word to make another word that is in the dictionary

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call