Abstract

In the context of the rapid development of multimedia and information technology, machine translation plays an indispensable role in cross-border e-commerce between China and Japan. However, due to the complexity and diversity of natural languages, a single neural machine translation model tends to fall into local optimality, leading to poor accuracy. To solve this problem, this paper proposes a general multimodal machine translation model based on visual information. First, visual information and text information are used to generate a visual representation of perceptual text information. Then, the two modal information are encoded separately, and the proportion of visual information in the whole multimodal information is controlled by a gating network. Finally, experiments are conducted on the image description datasets MSCOCO, Flickr30k, and video dataset VATEX, respectively. The results show that the algorithm in this paper achieves the best performance on both the BLEU and METEOR evaluation metrics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call