Abstract

Nowadays, the intercommunication and translation of global languages has become an indispensable condition for friendly communication among human beings around the world. The advancement of computer technology developed the machine translation from academic research to industrial applications. Additionally, a new and popular branch of machine learning is deep learning which has achieved excellent results in research fields such as natural language processing. This paper improved the performance of machine translation based on deep learning network and studied the intelligent recognition of English-Chinese machine translation models. This research mainly focused on solving out-of-vocabulary (OOV) problem of machine translation on unregistered words and rare words. Moreover, it combined stemming technology and data compression algorithm Byte Pair Encoding (BPE) and proposed a different subword-based word sequence segmentation method. Using this method, the English text is segmented into word sequences composed of subword units, and, at the same time, the Chinese text is segmented into character sequences composed of Chinese characters using unigram. Secondly, the current research also prevented the decoder from experiencing incomplete translation. Furthermore, it adopted a deep-attention mechanism that can improve the decoder's ability to obtain context information. Inspired by the traditional attention calculation process, this work uses a two-layer calculation structure in the improved attention to focus on the connection between the context vectors at different moments of the decoder. Based on the neural machine translation model Google Neural Machine Translation (GNMT), this paper conducted experimental analysis on the above improved methods on three different scale datasets. Experimental results verified that the improved method can solve OOV problem and improve accuracy of model translation.

Highlights

  • Language is the most important bridge in communication

  • On NLPCC2019 and OPUS, when the training epoch reaches 300, the loss value almost no longer decreases, and the BLEU almost no longer rises, which indicates that the deep learning network has reached a state of convergence

  • To verify the validity and correctness of the method designed in this work, we compare the method in this paper with other English machine translation methods. e methods compared are Dataset NLPCC2019 OPUS

Read more

Summary

Introduction

With rapid development for modern society and the gradual construction of global integration, the intercommunication and translation of global languages has become an indispensable condition for friendly communication among human beings all over the world. With the development of the global economy, Chinese and English have become the two most influential languages. English-Chinese translation is a shortcut to cross-language communication and plays an important role in the process of global integration. Machine translation is a kind of applied technology research in natural language processing (NLP), and it is one of the important branches. As a benchmark subject in NLP, research on machine translation is leading research as well as development of other branches of natural language processing [1,2,3,4,5]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call