Abstract

Visual relationship detection aims at detecting the interaction between objects from flat image, where visual appearance and spatial relationship between different objects are two key factors for detection. However, most existing methods usually extract 2D information of object from flat images, which lacks depth information compared to actual 3D space. To obtain and utilize depth information for visual relationship detection, we construct Depth VRDs dataset as an extension of the VRD dataset and propose adaptive depth-aware visual relationship detection network(ADVRD). In terms of visual appearance, we propose depth-aware visual fusion module to use additional depth visual information to guide RGB visual information where needs to be strengthened. In terms of spatial relationship, to generate a more accurate depth representation when locating object depth spatial position, we propose adaptive depth spatial location method which uses regional information variance to measure information relevance in each small region in object bounding box. Experiment results show that depth information can significantly improve the performance of our network on visual relationship detection tasks, especially for zero shots.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call