Abstract

Visual relationship detection, not only needs to identify the targets in the image and their position, but also to identify the interrelationship between the targets, is a comprehensive task including object detection, positioning, image classification. In this paper, pyramid convolutional networks are embedded in the image feature extraction module, and three convolutional kernels of different scales and depths are used to increase the sensing field of the network and ensure multi-scale feature extraction. At the same time, a recurrent neural network is introduced, which combines the background information in the image to improve the accuracy of semantic relationship detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call