Weakly Supervised Multimodal Affordance Grounding for Egocentric Images

Lingjing Xu,Yang Gao,Wenfeng Song,Aimin Hao

doi:10.1609/aaai.v38i6.28451

Abstract

To enhance the interaction between intelligent systems and the environment, locating the affordance regions of objects is crucial. These regions correspond to specific areas that provide distinct functionalities. Humans often acquire the ability to identify these regions through action demonstrations and verbal instructions. In this paper, we present a novel multimodal framework that extracts affordance knowledge from exocentric images, which depict human-object interactions, as well as from accompanying textual descriptions that describe the performed actions. The extracted knowledge is then transferred to egocentric images. To achieve this goal, we propose the HOI-Transfer Module, which utilizes local perception to disentangle individual actions within exocentric images. This module effectively captures localized features and correlations between actions, leading to valuable affordance knowledge. Additionally, we introduce the Pixel-Text Fusion Module, which fuses affordance knowledge by identifying regions in egocentric images that bear resemblances to the textual features defining affordances. We employ a Weakly Supervised Multimodal Affordance (WSMA) learning approach, utilizing image-level labels for training. Through extensive experiments, we demonstrate the superiority of our proposed method in terms of evaluation metrics and visual results when compared to existing affordance grounding models. Furthermore, ablation experiments confirm the effectiveness of our approach. Code:https://github.com/xulingjing88/WSMA.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Weakly Supervised Multimodal Affordance Grounding for Egocentric Images

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Compressed Multimodal Hierarchical Extreme Learning Machine for Speech Enhancement
Tassadaq Hussain ... Sabato Marco Siniscalchi
-
Tassadaq Hussain, et. al.Tassadaq Hussain ... Sabato Marco Siniscalchi
01 Nov 2019
01 Nov 2019

Meta-Analysis of Artificial Intelligence Works in Ubiquitous Learning Environments and Technologies
Caitlin Sam ... Mogiveny Rajkoomar
International Journal of Advanced Computer Science and Applications | VOL. 11
Caitlin Sam, et. al.Caitlin Sam ... Mogiveny Rajkoomar
01 Jan 2020
International Journal of Advanced Computer Science and Applications | VOL. 11

Explainable AI approaches in deep learning: Advancements, applications and challenges
Md Tanzib Hosain ... Md Mohsin Kabir
Computers and Electrical Engineering | VOL. 117
Md Tanzib Hosain, et. al.Md Tanzib Hosain ... Md Mohsin Kabir
26 Apr 2024
Computers and Electrical Engineering | VOL. 117

Impact of region contouring variability on image-based focal therapy evaluation
Hashim U Ahmed ... Dean C Barratt
-
Hashim U Ahmed, et. al.Hashim U Ahmed ... Dean C Barratt
18 Mar 2016
18 Mar 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Weakly Supervised Multimodal Affordance Grounding for Egocentric Images

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence