Abstract
Machine translation, which will be used widely in human-computer interaction services to Internet of Things (IoT), is a key technology in artificial intelligence field. This paper presents a minimum Bayes-risk (MBR) phrase table pruning method for pivot-based statistical machine translation (SMT). The SMT system requires a great amount of bilingual data to build a high-performance translation model. For some language pairs, such as Chinese-English, massive bilingual data are available on the web. However, for most language pairs, large-scale bilingual data are hard to obtain. Pivot-based SMT is proposed to solve the data scarcity problem: it introduces a pivot language to bridge the source language and the target language. Therefore, a source-target translation model based on well-trained source-pivot and pivot-target translation models can be derived with the pivot-based approach. However, due to the ambiguities of the pivot language, source and target phrases with different meanings may be wrongly matched. Consequently, the derived source-target phrase table may contain incorrect phrase pairs. To alleviate this problem, we apply the MBR method to prune the phrase table. The MBR pruning method removes the phrase pairs with the lowest risk from the phrase table. Experimental results on Europarl data show that the proposed method can both reduce the size of phrase tables and improve the performance of translations. This study also gives a useful reference to many IoT research field and smart web services.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.