Abstract

Visual relationship detection (VRD), a challenging task in the image understanding, suffers from vague connection between relationship patterns and visual appearance. This issue is caused by the high diversity of relationship-independent visual appearance, where inexplicit and redundant cues may not contribute to the relationship detection, even confuse the detector. Previous relationship detection models have shown remarkable progress in leveraging external textual information or scene-level interaction to complement relationship detection cues. In this work, we propose Contextual Coefficients Excitation Feature (CCEF), a focal visual representation, which is adaptively recalibrated from original visual feature responses by explicitly modeling the interdependencies between features and their contextual coefficients. Specifically, contextual coefficients are obtained by calculation of both the spatial coefficients and generated-label ones. In addition, a conditional Wasserstein Generative Adversarial Network (WGAN) regularized with a relationship classification loss is designed to alleviate inadequate training of generated-label coefficients due to long tail distribution of relationship. Experimental results demonstrate the effective improvements of our method on relationship detection. In particular, our method improves the recall from 8.5% to 23.2% of predicting unseen relationship from zero-shot set.

Highlights

  • With rapid development of deep learning and image recognition [1,2,3,4,5], visual relationship detection [6], a higher-level visual understanding task, has been a popular research topic

  • We propose a novel Contextual Coefficients Excitation Feature (CCEF), which is a focal representation based on a new relationship space

  • CCEF (Ours − A + S + G): Use visual representation described as Section 3.1

Read more

Summary

Introduction

With rapid development of deep learning and image recognition [1,2,3,4,5], visual relationship detection [6], a higher-level visual understanding task, has been a popular research topic. Visual relationship detection aims to recognize various visually observable predicates between subject and object, where subject and object are a pair of objects in the image. Visual relationship detection is a challenging task that most existing relationship detection methods [8,9] treat each type of relationship predicates as a class, leading to the high diversity of visual appearance which varies greatly with different relationship instances. This visual diversity undermines the correlation between relationship predicates and visual appearance and confuses the detector.

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call