Viewpoint-Adaptive Representation Disentanglement Network for Change Captioning.

Yunbin Tu,Liang Li,Qingming Huang,Junping Du,Ke Lu,Li Su

doi:10.1109/tip.2023.3268004

Abstract

Change captioning is to describe the fine-grained change between a pair of images. The pseudo changes caused by viewpoint changes are the most typical distractors in this task, because they lead to the feature perturbation and shift for the same objects and thus overwhelm the real change representation. In this paper, we propose a viewpoint-adaptive representation disentanglement network to distinguish real and pseudo changes, and explicitly capture the features of change to generate accurate captions. Concretely, a position-embedded representation learning is devised to facilitate the model in adapting to viewpoint changes via mining the intrinsic properties of two image representations and modeling their position information. To learn a reliable change representation for decoding into a natural language sentence, an unchanged representation disentanglement is designed to identify and disentangle the unchanged features between the two position-embedded representations. Extensive experiments show that the proposed method achieves the state-of-the-art performance on the four public datasets. The code is available at https://github.com/tuyunbin/VARD.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Viewpoint-Adaptive Representation Disentanglement Network for Change Captioning.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Image Processing

Lead the way for us

Journal: IEEE Transactions on Image Processing	Publication Date: Jan 1, 2023
Citations: 5

Similar Papers

Semantic Relation-aware Difference Representation Learning for Change Captioning

-

01 Aug 2021
01 Aug 2021

SMART: Syntax-Calibrated Multi-Aspect Relation Transformer for Change Captioning.
Yunbin Tu ... Zheng-Jun Zha
IEEE transactions on pattern analysis and machine intelligence | VOL. 46
Yunbin Tu, et. al.Yunbin Tu ... Zheng-Jun Zha
01 Jul 2024
IEEE transactions on pattern analysis and machine intelligence | VOL. 46

Conditional Feature Embedding by Visual Clue Correspondence Graph for Person Re-Identification.
Fufu Yu ... Wei-Shi Zheng
IEEE Transactions on Image Processing | VOL. 31
Fufu Yu, et. al.Fufu Yu ... Wei-Shi Zheng
01 Jan 2021
IEEE Transactions on Image Processing | VOL. 31

DG‐based SPO tuple recognition using self‐attention M‐Bi‐LSTM
Joon‐Young Jung
ETRI Journal | VOL. 44
Joon‐Young JungJoon‐Young Jung
29 Nov 2021
ETRI Journal | VOL. 44

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Viewpoint-Adaptive Representation Disentanglement Network for Change Captioning.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Image Processing