Multimodal Neural Machine Translation for Mongolian to Chinese

Weichen Jian,Pengcong Wang,Yisong Wang,Hongxu Hou,Nier Wu,Shuo Sun,Zongheng Yang

doi:10.1109/ijcnn55064.2022.9892831

Abstract

Multimodal Machine Translation (MMT) aims to enhance translation quality by incorporating information from other modalities (usually images). However, dominant MMT models do not consider that visual features not only provide supplementary information also introduce much noise. In this paper, we propose the visual features filter to solve this issue. Specifically, we adopt a soft-lookup function to select the visual features relevant to the text and then use these visual features as pseudo-words concatenating with a text representation. In addition, our model conducts two-pass decoding. The secondarypass decoding amounts to polishing which can identify errors in draft translations. The reason is that polishing expands the view in the process of decoding each target token, providing more contextual information. Besides, since most words in draft translations can be copied to final translations, we further equip our model with the copying mechanism to reserve those words that do not need to be corrected. MMT has achieved success in some mainstream languages at present. In order to promote the development of MMT in low-resource languages such as Mongolian, we deploy our model to the Mongolian→Chinese translation task. We expand Multi30k dataset to synthetic Mongolian and Chinese descriptions. Experiments on synthetic Mongolian and Chinese datasets demonstrate that our model can bring significant improvements.

Full Text