Abstract

Extreme instance imbalance among categories and combinatorial explosion make the recognition of Human-Object Interaction (HOI) a challenging task. Few studies have addressed both challenges directly. Motivated by the success of few-shot learning that learns a robust model from a few instances, we formulate HOI as a few-shot task in a meta-learning framework to alleviate the above challenges. Due to the fact that the intrinsical characteristic of HOI is diverse and interactive, we propose a Semantic-guided Attentive Prototypes Network (SAPNet) framework to learn a semantic-guided metric space where HOI recognition can be performed by computing distances to attentive prototypes of each class. Specifically, the model generates attentive prototypes guided by the category names of actions and objects, which highlight the commonalities of images from the same class in HOI. In addition, we design two alternative prototypes calculation methods, i.e., Prototypes Shift (PS) approach and Hallucinatory Graph Prototypes (HGP) approach, which explore to learn a suitable category prototypes representations in HOI. Finally, in order to realize the task of few-shot HOI, we reorganize 2 HOI benchmark datasets with 2 split strategies, i.e., HICO-NN, TUHOI-NN, HICO-NF, and TUHOI-NF. Extensive experimental results on these datasets have demonstrated the effectiveness of our proposed SAPNet approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.