Abstract

Machine translation is a popular automation approach for translating texts between different languages. Although traditionally it has a strong focus on natural language, images can potentially provide an additional source of information in machine translation. However, there are presently two challenges: (i) the lack of an effective fusion method to handle the triangular-mapping function between image, text, and semantic knowledge; and (ii) the accessibility of large-scale parallel corpus to train a model for generating accurate machine translations. To address these challenges, this work proposes an effective multimodality information fusion method for automated machine translation based on semi-supervised learning. The method fuses multimodality information, texts and images to deliver automated machine translation. Specifically, our objective fuses multimodalities with alignment in a multimodal attention network, which advances the method through the power of mapping text and image features to their semantic information with accuracy. Moreover, a semi-supervised learning method is utilized for its capability in using a small number of parallel corpus for supervised training on the basis of unsupervised training. Conducted on the Multi30k dataset, the experimental results shows the promising performance of our proposed fusion method compared with state-of-the-art approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call