Abstract

3D scene graph prediction is important for intelligent agents to gather information and perceive semantics of their environments. However, constructing an effective graph is nontrivial given the complexity of natural scenes. Existing solutions for graph representation of 3D scenes still distinguish each detailed discrepancy among all the relationships as flat thinking, ignoring the mechanism used by humans to perform this task. Inspired by the role of the prefrontal cortex in hierarchical reasoning, we analyze this problem from a novel perspective: exploring hierarchical spatial layout cues in 3D space and navigating that hierarchy to make the 3D scene graph more accurate in a vertical division to horizontal propagation strategy. To this end, we first encode the contextual object features for fine-gained object category classification. Next, we build a bottom-up hierarchical graph to predict remarkably diverse support relationships in a single concept regardless of numerous irrelevant relationships. Finally, equipped with the spatially-true and semantically-meaningful support relationships, we focus on the local region layout to propagate the semantic features to predict the additional non-support relationships under the guidance of the given referred hierarchical graph nodes. Experiments on the challenging 3DSSG benchmark show that our algorithm outperforms existing state-of-the-art, and can also alleviate the impact of the long-tailed distribution of training data. Our code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/HHrEtvP/HSLC-3DSG/</uri> .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call