DecoupleCLIP: A Novel Cross-Modality Decouple Model for Painting Captioning

Mingliang Zhang,Xia Hou,Yujing Yan,Meng Sun

doi:10.3390/electronics13214207

Abstract

Image captioning aims to describe the content in an image, which plays a critical role in image understanding. Existing methods tend to generate the text for more distinct natural images. These models can not be well for paintings containing more abstract meaning due to the limitation of objective parsing without related knowledge. To alleviate, we propose a novel cross-modality decouple model to generate the objective and subjective parsing separately. Concretely, we propose to encode both subjective semantic and implied knowledge contained in the paintings. The key point of our framework is decoupled CLIP-based branches (DecoupleCLIP). For the objective caption branch, we utilize the CLIP model as the global feature extractor and construct a feature fusion module for global clues. Based on the objective caption branch structure, we add a multimodal fusion module called the artistic conception branch. In this way, the objective captions can constrain artistic conception content. We conduct extensive experiments to demonstrate our DecoupleCLIP’s superior ability over our new dataset. Our model achieves nearly 2% improvement over other comparison models on CIDEr.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DecoupleCLIP: A Novel Cross-Modality Decouple Model for Painting Captioning

Abstract

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Journal: Electronics	Publication Date: Oct 27, 2024
License type: CC BY 4.0

Similar Papers

Building a Practical Multimodal System with a Multimodal Fusion Module
Yong Sun ... Vera Chung
-
Yong Sun, et. al.Yong Sun ... Vera Chung
01 Jan 2009
01 Jan 2009

Uni4Eye: Unified 2D and 3D Self-supervised Pre-training via Masked Image Modeling Transformer for Ophthalmic Image Classification
Zhiyuan Cai ... Li Lin
-
Zhiyuan Cai, et. al.Zhiyuan Cai ... Li Lin
01 Jan 2021
01 Jan 2021

Unsupervised local deep feature for image recognition
Yang Wang ... Wenyu Liu
Information Sciences | VOL. 351
Yang Wang, et. al.Yang Wang ... Wenyu Liu
05 Mar 2016
Information Sciences | VOL. 351

FIRCNet: Feature-based image reconstruction with classification learning for person re-identification
Junhao Chen
Applied and Computational Engineering | VOL. 52
Junhao ChenJunhao Chen
27 Mar 2024
Applied and Computational Engineering | VOL. 52

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DecoupleCLIP: A Novel Cross-Modality Decouple Model for Painting Captioning

Abstract

Talk to us

Similar Papers

More From: Electronics