Abstract

Enhancing perception of the local environment with semantic information like the room type is an important ability for agents acting in their environment. Such high-level knowledge can reduce the effort needed for, for example, object detection. This paper shows how to extract the room label from a small amount of room percepts taken from a certain view point (like the door frame when entering the room). Such functionality is similar to the human ability to get a scene impression from a quick glance. We propose a new three-dimensional (3D) spatial feature vector that captures the layout of a scene from extracted planar surfaces. The trained models emulate the human brain sensitivity to the 3D geometry of a room. Further, we show that our descriptor complements the information encoded by the Gist feature vector — a first attempt to model the mentioned brain area. The global scene properties are extracted from edge information in 2D depictions of the scene. Both features can be fused, resulting in a system that follows our goal to combine psychological insights on human scene perception with physical properties of environments. This paper provides detailed insights into the nature of our spatial descriptor.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call