Abstract

Visual relationship detection (VRD) aims to locate objects and recognize their pairwise relationships for parsing scene graphs. To enable a higher understanding of the visual scene, we propose a symmetric fusion learning model for visual relationship detection and scene graph parsing. We integrate objects and relationship features at visual and semantic levels for better relations feature mapping. First, we apply a feature fusion for the construction of the visual module and introduce a semantic representation learning module combined with large-scale external knowledge. We minimize the loss by matching the visual and semantic embeddings using our designed symmetric learning module. The symmetric learning module based on reverse cross-entropy can boost cross-entropy symmetrically and perform reverse supervision for inaccurate annotations. Our model is compared with other state-of-the-art methods in two public data sets. Experiments show that our proposed model achieves encouraging performance in various metrics for the two data sets investigated. The further detailed analysis demonstrates that the proposed method performs better by partially alleviating the impact of inaccurate annotations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call