Variational Structured Attention Networks for Deep Visual Representation Learning.

Guanglei Yang,Elisa Ricci,Mingli Ding,Paolo Rota,Xavier Alameda-Pineda,Dan Xu

doi:10.1109/tip.2021.3137647

Abstract

Convolutional neural networks have enabled major progresses in addressing pixel-level prediction tasks such as semantic segmentation, depth estimation, surface normal prediction and so on, benefiting from their powerful capabilities in visual representation learning. Typically, state of the art models integrate attention mechanisms for improved deep feature representations. Recently, some works have demonstrated the significance of learning and combining both spatial- and channel-wise attentions for deep feature refinement. In this paper, we aim at effectively boosting previous approaches and propose a unified deep framework to jointly learn both spatial attention maps and channel attention vectors in a principled manner so as to structure the resulting attention tensors and model interactions between these two types of attentions. Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework, leading to VarIational STructured Attention networks (VISTA-Net). We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters. As demonstrated by our extensive empirical evaluation on six large-scale datasets for dense visual prediction, VISTA-Net outperforms the state-of-the-art in multiple continuous and discrete prediction tasks, thus confirming the benefit of the proposed approach in joint structured spatial-channel attention estimation for deep representation learning. The code is available at https://github.com/ygjwd12345/VISTA-Net.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE transactions on image processing : a publication of the IEEE Signal Processing Society	Publication Date: Jan 1, 2024
Citations: 3	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Variational Structured Attention Networks for Deep Visual Representation Learning.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Lead the way for us

Similar Papers

Unsupervised Visual Representation Learning via Dual-Level Progressive Similar Instance Selection.
Hehe Fan ... Ping Liu
IEEE transactions on cybernetics | VOL. 52
Hehe Fan, et. al.Hehe Fan ... Ping Liu
01 Sep 2022
IEEE transactions on cybernetics | VOL. 52

Galaxy mergers in Subaru HSC-SSP: A deep representation learning approach for identification, and the role of environment on merger incidence
Kiyoaki Christopher Omori ... Xuheng Ding
Astronomy & Astrophysics | VOL. 679
Kiyoaki Christopher Omori, et. al.Kiyoaki Christopher Omori ... Xuheng Ding
01 Nov 2023
Astronomy & Astrophysics | VOL. 679

Automated Vulnerability Detection in Source Code Using Deep Representation Learning
Rebecca Russell ... Tomo Lazovich
-
Rebecca Russell, et. al.Rebecca Russell ... Tomo Lazovich
01 Dec 2018
01 Dec 2018

Incorporating Visual Information in Audio Based Self-Supervised Speaker Recognition
Danwei Cai ... Ming Li
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30
Danwei Cai, et. al.Danwei Cai ... Ming Li
01 Jan 2021
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Variational Structured Attention Networks for Deep Visual Representation Learning.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on image processing : a publication of the IEEE Signal Processing Society