Building Category Graphs Representation with Spatial and Temporal Attention for Visual Navigation

Xiaobo Hu,Youfang Lin,Zhihao Wu,Shuo Wang,Hehe Fan,Kai Lv

doi:10.1145/3653714

Abstract

Given an object of interest, visual navigation aims to reach the object’s location based on a sequence of partial observations. To this end, an agent needs to (1) acquire specific knowledge about the relations of object categories in the world during training and (2) locate the target object based on the pre-learned object category relations and its trajectory in the current unseen environment. In this article, we propose a Category Relation Graph (CRG) to learn the knowledge of object category layout relations and a Temporal-Spatial-Region attention (TSR) architecture to perceive the long-term spatial-temporal dependencies of objects, aiding navigation. We establish CRG to learn prior knowledge of object layout and deduce the positions of specific objects. Subsequently, we propose the TSR architecture to capture relationships among objects in temporal, spatial, and regions within observation trajectories. Specifically, we implement a Temporal attention module (T) to model the temporal structure of the observation sequence, implicitly encoding historical moving or trajectory information. Then, a Spatial attention module (S) uncovers the spatial context of the current observation objects based on CRG and past observations. Last, a Region attention module (R) shifts the attention to the target-relevant region. Leveraging the visual representation extracted by our method, the agent accurately perceives the environment and easily learns a superior navigation policy. Experiments on AI2-THOR demonstrate that our CRG-TSR method significantly outperforms existing methods in both effectiveness and efficiency. The supplementary material includes the code and will be publicly available.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Building Category Graphs Representation with Spatial and Temporal Attention for Visual Navigation

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications

Lead the way for us

Similar Papers

One Spatio-Temporal Sharpening Attention Mechanism for Light-Weight YOLO Models Based on Sharpening Spatial Attention.
Mengfan Xue ... Yunfei Guo
Sensors | VOL. 21
Mengfan Xue, et. al.Mengfan Xue ... Yunfei Guo
28 Nov 2021
Sensors | VOL. 21

Nesting spatiotemporal attention networks for action recognition
Jiapeng Li ... Nanning Zheng
Neurocomputing | VOL. 459
Jiapeng Li, et. al.Jiapeng Li ... Nanning Zheng
30 Jun 2021
Neurocomputing | VOL. 459

Dynamic Hand Gesture Recognition Using Multi-Branch Attention Based Graph and General Deep Learning Model
Abu Saleh Musa Miah ... Jungpil Shin
IEEE Access | VOL. 11
Abu Saleh Musa Miah, et. al.Abu Saleh Musa Miah ... Jungpil Shin
01 Jan 2023
IEEE Access | VOL. 11

Spatial, feature and temporal attentional mechanisms in visual motion processing
Sonia Baloni
-
Sonia BaloniSonia Baloni
20 Feb 2022
20 Feb 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Building Category Graphs Representation with Spatial and Temporal Attention for Visual Navigation

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications