Abstract

The cross-domain image captioning, which is trained on a source domain and generalized to other domains, usually faces the large domain shift problem. Although prior work has attempted to leverage both paired source and unpaired target data to minimize this shift, the performance is still unsatisfactory. One main reason lies in the large discrepancy in language expression between two domains, where diverse language styles are adopted to describe an image from different views, resulting in different semantic descriptions for an image. To tackle this problem, this paper proposes a Style-based Cross-domain Image Captioner (SCIC) which incorporates the discriminative style information into the encoder-decoder framework, and interprets an image as a special sentence according to external style instructions. Technically, we design a novel "Instruction-based LSTM", which adds the instruct gate to collect a style instruction, and then outputs a specified format according to that instruction. Two objectives are designed to train I-LSTM: 1) generating correct image descriptions and 2) generating correct styles, thus the model is expected to accurately capture the semantic meanings of an image by the special caption as well as understand the syntactic structure of the caption. We use MS-COCO as the source domain, and Oxford-102, CUB-200, Flickr30k as the target domains. Experimental results demonstrate that our model consistently outperforms the previous methods, and the style information incorporating with I-LSTM significantly improves the performance, with 5% CIDEr improvements at least on all datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.