Abstract
This research was conducted to solve the out-of-vocabulary problem caused by Uyghur spelling errors in Uyghur–Chinese machine translation, so as to improve the quality of Uyghur–Chinese machine translation. This paper assesses three spelling correction methods based on machine translation: 1. Using a Bilingual Evaluation Understudy (BLEU) score; 2. Using a Chinese language model; 3. Using a bilingual language model. The best results were achieved in both the spelling correction task and the machine translation task by using the BLEU score for spelling correction. A maximum F1 score of 0.72 was reached for spelling correction, and the translation result increased the BLEU score by 1.97 points, relative to the baseline system. However, the method of using a BLEU score for spelling correction requires the support of a bilingual parallel corpus, which is a supervised method that can be used in corpus pre-processing. Unsupervised spelling correction can be performed by using either a Chinese language model or a bilingual language model. These two methods can be easily extended to other languages, such as Arabic.
Highlights
Whether it is the traditional statistical machine translation (SMT), or the recent research focus on neural machine translation (NMT), out-of-vocabulary (OOV) has always been a problem affecting translation
In order to improve the accuracy of natural language processing tasks such as speech recognition [5] and machine translation, a great deal of in-depth research on spelling correction has been undertaken
In order to improve the above problem, we propose a method of spelling error correction using the bilingual language model
Summary
Whether it is the traditional statistical machine translation (SMT), or the recent research focus on neural machine translation (NMT), out-of-vocabulary (OOV) has always been a problem affecting translation. Damerau proposed a rule-based method which used dictionary matching to check and correct spelling errors [6]. There are two main types of spelling errors: non-word and real-word errors. Non-word error is the result of a spelling error where the word itself is not in the dictionary and is not a known word. Mistakenly spelling “apple” into “appll” is a non-word error because “appll”. Real-word error is due to misspelling a word to make another word that is in the dictionary
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.