Abstract

Few-shot human-object interaction (FS-HOI) recognition aims at inferring new interactions between human actions and surrounding objects merely with a few available instances. It is beneficial to alleviate the long-tail and combinatorial explosion problems in human-object interaction (HOI). Nevertheless, the existing FS-HOI methods only focus on modeling the relationships between labeled samples and unlabeled samples in the Euclidean domain, which neglects the rich relational structures of the visual information among labeled samples and between human actions and objects. Accordingly, we tackle the few-shot HOI task in the non-Euclidean domain and present a graph-based model, namely, task-oriented high-order context graph network (THCG-Net). It contains a task attention module (TA-Module) and a high-order context graph module (HG-Module). In TA-Module, an attention mechanism is designed by utilizing task information to build a task-oriented space, in which the discriminative information for the current task (episode) is captured by embedding the visual features into the task-oriented space. The HG-Module is proposed to construct a task-level graph and takes the context information as high-order knowledge, which provides discriminative guidance for propagating visual information. It captures the discriminability among different categories while highlights the commonality of related categories adaptively, which effectively transfers knowledge to related categories. Extensive experimental results on two benchmark datasets, HICO-FS and TUHOI-FS, are provided. It demonstrates that our THCG-Net significantly outperforms the state-of-the-art approaches, which proves its impressive effectiveness in recognizing various human actions and surrounding objects in few-shot scenarios.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call