Multitask Learning for Cross-Domain Image Captioning

Min Yang,Wei Xu,Yabing Feng,Wei Zhao,Kai Lei,Zhou Zhao,Xiaojun Chen

doi:10.1109/tmm.2018.2869276

Abstract

Recent artificial intelligence research has witnessed great interest in automatically generating text descriptions of images, which are known as the image captioning task. Remarkable success has been achieved on domains where a large number of paired data in multimedia are available. Nevertheless, annotating sufficient data is labor-intensive and time-consuming, establishing significant barriers for adapting the image captioning systems to new domains. In this study, we introduc a novel Multitask Learning Algorithm for cross-Domain Image Captioning (MLADIC). MLADIC is a multitask system that simultaneously optimizes two coupled objectives via a dual learning mechanism: image captioning and text-to-image synthesis, with the hope that by leveraging the correlation of the two dual tasks, we are able to enhance the image captioning performance in the target domain. Concretely, the image captioning task is trained with an encoder–decoder model (i.e., CNN-LSTM) to generate textual descriptions of the input images. The image synthesis task employs the conditional generative adversarial network (C-GAN) to synthesize plausible images based on text descriptions. In C-GAN, a generative model $G$ synthesizes plausible images given text descriptions, and a discriminative model $D$ tries to distinguish the images in training data from the generated images by $G$ . The adversarial process can eventually guide $G$ to generate plausible and high-quality images. To bridge the gap between different domains, a two-step strategy is adopted in order to transfer knowledge from the source domains to the target domains. First, we pre-train the model to learn the alignment between the neural representations of images and that of text data with the sufficient labeled source domain data. Second, we fine-tune the learned model by leveraging the limited image–text pairs and unpaired data in the target domain. We conduct extensive experiments to evaluate the performance of MLADIC by using the MSCOCO as the source domain data, and using Flickr30k and Oxford-102 as the target domain data. The results demonstrate that MLADIC achieves substantially better performance than the strong competitors for the cross-domain image captioning task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multitask Learning for Cross-Domain Image Captioning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia

Lead the way for us

Journal: IEEE Transactions on Multimedia	Publication Date: Apr 1, 2019
Citations: 161

Similar Papers

Dual Learning for Cross-domain Image Captioning
Wei Zhao ... Yabing Feng
-
Wei Zhao, et. al.Wei Zhao ... Yabing Feng
06 Nov 2017
06 Nov 2017

Cross-Domain Image Captioning via Cross-Modal Retrieval and Model Adaptation.
Wentian Zhao ... Jiebo Luo
IEEE Transactions on Image Processing | VOL. 30
Wentian Zhao, et. al.Wentian Zhao ... Jiebo Luo
17 Dec 2020
IEEE Transactions on Image Processing | VOL. 30

Overcoming learning bias via Prototypical Feature Compensation for source-free domain adaptation
Zicheng Pan ... Yongsheng Gao
Pattern Recognition | VOL. 158
Zicheng Pan, et. al.Zicheng Pan ... Yongsheng Gao
17 Sep 2024
Pattern Recognition | VOL. 158

Reducing the effect of sample bias for small data sets with double‐weighted support vector transfer regression
Huan Luo ... Stephanie German Paal
Computer-Aided Civil and Infrastructure Engineering | VOL. 36
Huan Luo, et. al.Huan Luo ... Stephanie German Paal
01 Sep 2020
Computer-Aided Civil and Infrastructure Engineering | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multitask Learning for Cross-Domain Image Captioning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia