Abstract

In the context of the rapid development of multimedia and information technology, machine translation plays an indispensable role in cross-border e-commerce between China and Japan. However, due to the complexity and diversity of natural languages, a single neural machine translation model tends to fall into local optimality, leading to poor accuracy. To solve this problem, this paper proposes a general multimodal machine translation model based on visual information. First, visual information and text information are used to generate a visual representation of perceptual text information. Then, the two modal information are encoded separately, and the proportion of visual information in the whole multimodal information is controlled by a gating network. Finally, experiments are conducted on the image description datasets MSCOCO, Flickr30k, and video dataset VATEX, respectively. The results show that the algorithm in this paper achieves the best performance on both the BLEU and METEOR evaluation metrics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.