When Pairs Meet Triplets: Improving Low-Resource Captioning via Multi-Objective Optimization

Yike Wu,Zhong Su,Shiwan Zhao,Xiaojie Yuan,Ying Zhang

doi:10.1145/3492325

Abstract

Image captioning for low-resource languages has attracted much attention recently. Researchers propose to augment the low-resource caption dataset into (image, rich-resource language, and low-resource language) triplets and develop the dual attention mechanism to exploit the existence of triplets in training to improve the performance. However, datasets in triplet form are usually small due to their high collecting cost. On the other hand, there are already many large-scale datasets, which contain one pair from the triplet, such as caption datasets in the rich-resource language and translation datasets from the rich-resource language to the low-resource language. In this article, we revisit the caption-translation pipeline of the translation-based approach to utilize not only the triplet dataset but also large-scale paired datasets in training. The caption-translation pipeline is composed of two models, one caption model of the rich-resource language and one translation model from the rich-resource language to the low-resource language. Unfortunately, it is not trivial to fully benefit from incorporating both the triplet dataset and paired datasets into the pipeline, due to the gap between the training and testing phases and the instability in the training process. We propose to jointly optimize the two models of the pipeline in an end-to-end manner to bridge the training and testing gap, and introduce two auxiliary training objectives to stabilize the training process. Experimental results show that the proposed method improves significantly over the state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

When Pairs Meet Triplets: Improving Low-Resource Captioning via Multi-Objective Optimization

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications

Lead the way for us

Similar Papers

Toward Zero-Shot and Zero-Resource Multilingual Question Answering
Chia-Chih Kuo ... Kuan-Yu Chen
IEEE Access | VOL. 10
Chia-Chih Kuo, et. al.Chia-Chih Kuo ... Kuan-Yu Chen
01 Jan 2021
IEEE Access | VOL. 10

Multi task learning with general vector space for cross-lingual semantic relation detection
Rizka W Sholikah ... Ayu Purwarianti
Journal of King Saud University - Computer and Information Sciences | VOL. 34
Rizka W Sholikah, et. al.Rizka W Sholikah ... Ayu Purwarianti
11 Aug 2020
Journal of King Saud University - Computer and Information Sciences | VOL. 34

Building a knowledge graph by using cross-lingual transfer method and distributed MinIE algorithm on apache spark
Phuc Do ... Trung Phan
Neural Computing and Applications | VOL. 34
Phuc Do, et. al.Phuc Do ... Trung Phan
24 Nov 2020
Neural Computing and Applications | VOL. 34

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs
Federico Cassano ... Arjun Guha
Proceedings of the ACM on Programming Languages | VOL. 8
Federico Cassano, et. al.Federico Cassano ... Arjun Guha
08 Oct 2024
Proceedings of the ACM on Programming Languages | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

When Pairs Meet Triplets: Improving Low-Resource Captioning via Multi-Objective Optimization

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications