Abstract

In this paper, we propose a fixed-size object encoding method called FOE-VRD to improve performance of visual relationship detection tasks. For each relationship triplet in a given image, FOE-VRD not only considers the subject and object, but also uses one fixed-size vector to encoding all background objects of the image. In this way, we introduce more background knowledge to assist the relationship detector for better performance. We firstly use a regular convolution neural network as a feature extractor to generate high-level features of input images. Then, for each relationship triplet, we apply ROI-pooling as the feature generator on the bounding boxes of subject and object to get two corresponding feature vectors. Moreover, we propose a novel method to encode all background objects in each image by using one fixed-size vector (i.e., FBE vector). By concatenating the 3 generated feature vectors, we successfully encode the relationship using one fixed-size vector. The generated feature vector is then feed into a fully connected neural network to get the predicate classification result. Experimental results on VRD and Visual Genome databases show that the proposed method works well on both predicate classification and relationship detection tasks, especially on the situation of zero-shot detection.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.