Multimodal Machine Translation Research Articles

Domain-specific Multi-modal Neural Machine Translation (DMNMT) aims to translate domain-specific sentences from a source language to a target language by incorporating text-related visual information. Generally, domain-specific text-image data often complement each other and have the potential to collaboratively enhance the representation of domain-specific information. Unfortunately, there is a considerable modality gap between image and text in data format and semantic expression, which leads to distinctive challenges in domain-text translation tasks. Narrowing the modality gap and improving domain-aware representation are two critical challenges in DMNMT. To this end, this paper proposes a progressive modality-complement aggregative MultiTransformer, which aims to simultaneously narrow the modality gap and capture domain-specific multi-modal representation. We first adopt a bidirectional progressive cross-modal interactive strategy to effectively narrow the text-to-text, text-to-visual, and visual-to-text semantics in the multi-modal representation space by integrating visual and text information layer-by-layer. Subsequently, we introduce a modality-complement MultiTransformer based on progressive cross-modal interaction to extract the domain-related multi-modal representation, thereby enhancing machine translation performance. Experiment results on the Fashion-MMT and Multi-30k datasets are conducted, and the results show that the proposed approach outperforms the compared state-of-the-art (SOTA) methods on the En-Zh task in E-commerce domain, En-De, En-Fr and En-Cs tasks of Multi-30k in general domain. The in-depth analysis confirms the validity of the proposed modality-complement MultiTransformer and bidirectional progressive cross-modal interactive strategy for DMNMT.

Read full abstract

Video modality is an emerging research area among the numerous modalities utilized for multimodal machine translation. Multimodal machine translation uses multiple modalities to improve the machine-translated target language from the source language. However, the currently available multimodal dataset is focused on a few well-studied languages. In this paper, we propose a video-guided multimodal machine translation (VMMT) model under a low-resource setting by building a synthetic multimodal dataset of the English-Hindi language pair, the first one of its kind for this language pair. The VMMT system employs spatio-temporal video context as an additional input modality along with the source text. The spatio-temporal video context is extracted using a pre-trained 3D convolutional neural network. We report how well the VMMT systems outperform the text-only neural machine translation (NMT) system using automatic evaluation metrics and human evaluation on two test datasets: one in-domain and another out-domain. Our results indicate that the use of video context as an additional input modality enhances the performance of the MT system in resolving the various MT challenges, such as handling rare words, ambiguity, etc., in both English→Hindi and Hindi→English translations. Our experimental results show a significant improvement of up to +4.2 BLEU and +0.07 chrF scores in English→Hindi and +5.4 BLEU and +0.07 chrF scores in Hindi→English with our VMMT system over unimodal NMT system. Our findings highlight the potential of visual cues as an additional modality for improving machine translation systems especially in low-resource settings and emphasize the importance of synthetic multimodal datasets in addressing the scarcity of diverse data for less-studied language pairs.

Read full abstract

Multimodal Machine Translation Research Articles

Related Topics

Articles published on Multimodal Machine Translation

Speech recognition and intelligent translation under multimodal human–computer interaction system

Multimodal Machine Translation Based on Enhanced Knowledge Distillation and Feature Fusion

Encoder–Decoder Calibration for Multimodal Machine Translation

CLIP-enhanced multimodal machine translation: integrating visual and label features with transformer fusion

Multimodal Machine Translation Approaches for Indian Languages: A Comprehensive Survey

Sanskrit to Hindi language translation using multimodal neural machine translation

Dose multimodal machine translation can improve translation performance?

Unsupervised Multimodal Machine Translation for Low-resource Distant Language Pairs

Multi-Modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models

Progressive modality-complement aggregative multitransformer for domain multi-modal neural machine translation

Multimodal Machine Translation

Multi-modal graph contrastive encoding for neural machine translation

Contrastive Adversarial Training for Multi-Modal Machine Translation

Morphology & word sense disambiguation embedded multimodal neural machine translation system between Sanskrit and Malayalam

Do cues in a video help in handling rare words in a machine translation system under a low-resource setting?

Learning to decode to future success for multi-modal neural machine translation

Multi-modal simultaneous machine translation fusion of image information

Adding visual attention into encoder-decoder model for multi-modal machine translation

Hindi to English Multimodal Machine Translation on News Dataset in Low Resource Setting

Layer-Level Progressive Transformer With Modality Difference Awareness for Multi-Modal Neural Machine Translation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multimodal Machine Translation Research Articles

Related Topics

Articles published on Multimodal Machine Translation

Speech recognition and intelligent translation under multimodal human–computer interaction system

Multimodal Machine Translation Based on Enhanced Knowledge Distillation and Feature Fusion

Encoder–Decoder Calibration for Multimodal Machine Translation

CLIP-enhanced multimodal machine translation: integrating visual and label features with transformer fusion

Multimodal Machine Translation Approaches for Indian Languages: A Comprehensive Survey

Sanskrit to Hindi language translation using multimodal neural machine translation

Dose multimodal machine translation can improve translation performance?

Unsupervised Multimodal Machine Translation for Low-resource Distant Language Pairs

Multi-Modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models

Progressive modality-complement aggregative multitransformer for domain multi-modal neural machine translation

Multimodal Machine Translation

Multi-modal graph contrastive encoding for neural machine translation

Contrastive Adversarial Training for Multi-Modal Machine Translation

Morphology & word sense disambiguation embedded multimodal neural machine translation system between Sanskrit and Malayalam

Do cues in a video help in handling rare words in a machine translation system under a low-resource setting?

Learning to decode to future success for multi-modal neural machine translation

Multi-modal simultaneous machine translation fusion of image information

Adding visual attention into encoder-decoder model for multi-modal machine translation

Hindi to English Multimodal Machine Translation on News Dataset in Low Resource Setting

Layer-Level Progressive Transformer With Modality Difference Awareness for Multi-Modal Neural Machine Translation