Abstract

Scene graph generation (SGGen) is a challenging task due to a complex visual context of an image. Intuitively, the human visual system can volitionally focus on attended regions by salient stimuli associated with visual cues. For example, to infer the relationship between man and horse, the interaction between human leg and horseback can provide strong visual evidence to predict the predicate ride. Besides, the attended region face can also help to determine the object man. Till now, most of the existing works studied the SGGen by extracting coarse-grained bounding box features while understanding fine-grained visual regions received limited attention. To mitigate the drawback, this article proposes a region-aware attention learning method. The key idea is to explicitly construct the attention space to explore salient regions with the object and predicate inferences. First, we extract a set of regions in an image with the standard detection pipeline. Each region regresses to an object. Second, we propose the object-wise attention graph neural network (GNN), which incorporates attention modules into the graph structure to discover attended regions for object inference. Third, we build the predicate-wise co-attention GNN to jointly highlight subject's and object's attended regions for predicate inference. Particularly, each subject-object pair is connected with one of the latent predicates to construct one triplet. The proposed intra-triplet and inter-triplet learning mechanism can help discover the pair-wise attended regions to infer predicates. Extensive experiments on two popular benchmarks demonstrate the superiority of the proposed method. Additional ablation studies and visualization further validate its effectiveness.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.