Visual relationship detection (VRD) is an important direction in the field of image processing, and it is a research task to explore object relationships based on object recognition and localization regression. At the same time, it is also one of the key contents of scene graph generation and construction of multimodal knowledge graph. In order to better improve the visual relationship detection effect, the fusion method of image location and feature information embedding is adopted on the vrd dataset. First, analyze the causes of errors in existing methods, and build a pre-training pre-dataset vrd_P for data enhancement; then, combine the characteristics of the vrd dataset to establish an entities relationship network (Entities-net) to guide target recognition; finally, a visual relationship detection model of LTransE is proposed to achieve the joint representation of target features and target locations for relationship detection. The results show that the method of embedding fusion of image location and feature information can effectively improve the visual relationship detection effect.
Read full abstract