Fixed-Size Objects Encoding for Visual Relationship Detection

Hengyue Pan,Xin Niu,Zhen Huang,Yixin Chen,Dongsheng Li,Peng Qiao,Siqi Shen

doi:10.1007/s11063-022-10766-0

Abstract

In this paper, we propose a fixed-size object encoding method called FOE-VRD to improve performance of visual relationship detection tasks. For each relationship triplet in a given image, FOE-VRD not only considers the subject and object, but also uses one fixed-size vector to encoding all background objects of the image. In this way, we introduce more background knowledge to assist the relationship detector for better performance. We firstly use a regular convolution neural network as a feature extractor to generate high-level features of input images. Then, for each relationship triplet, we apply ROI-pooling as the feature generator on the bounding boxes of subject and object to get two corresponding feature vectors. Moreover, we propose a novel method to encode all background objects in each image by using one fixed-size vector (i.e., FBE vector). By concatenating the 3 generated feature vectors, we successfully encode the relationship using one fixed-size vector. The generated feature vector is then feed into a fully connected neural network to get the predicate classification result. Experimental results on VRD and Visual Genome databases show that the proposed method works well on both predicate classification and relationship detection tasks, especially on the situation of zero-shot detection.

Full Text