Abstract

AbstractThe task of object goal navigation is to drive an embodied agent to find the location of a given target only using visual observation. The mapping from visual perception of observation determines the navigation actions. Heterogeneous relationships in the observation are the essential part of the scene graph, which can guide the agent to find the target more easily. In this work, we propose a novel Heterogeneous Zone Graph Visual Transformer formulation for graph representation and visual perception. It consists of two key ideas: (1) Heterogeneous Zone Graph (HZG) that explores the heterogeneous target-related zones graph and spatial information. It allows the agent to navigate efficiently. (2) Relation-wise Transformer Network (RTNet) that transforms the relationship between previously observed objects and navigation actions. RTNet extracts rich nodes and edges features as pays more attention to the target-related zone. We model self-attention on the node-to-node encoder and cross-attention on the edge-to-node decoder. We evaluate our methods on the AI2THOR dataset and show superior navigation performance. Code and datasets can be found inhttps://github.com/zhoukang12321/RTNet_VN_2023.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call