Abstract

Semantic understanding of unstructured environments plays an essential role in the autonomous planning and execution of whole-body humanoid locomotion and manipulation tasks. We introduce a new graph-based and data-driven method for semantic representation of unknown environments based on visual sensor data streams. The proposed method extends our previous work, in which loco-manipulation scene affordances are detected in a fully unsupervised manner. We build a geometric primitive-based model of the perceived scene and assign interaction possibilities, i.e. affordances, to the individual primitives. The major contribution of this paper is the enrichment of the extracted scene representation with semantic object information through spatio-temporal fusion of primitives during the perception. To this end, we combine the primitive-based scene representation with object detection methods to identify higher semantic structures in the scene. The qualitative and quantitative evaluation of the proposed method in various experiments in simulation and on the humanoid robot ARMAR-III demonstrates the effectiveness of the approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call