Abstract

Few-shot action recognition has attracted increasing attention in recent years, but it remains challenging due to the intrinsic difficulty in learning transferable knowledge to generalize to novel classes by using a few labeled samples. Although some successful progress has been made, most few-shot action recognition methods commonly focus on the global characteristics of samples while ignoring the local characteristics of samples, which results in the weak generalization ability of the model. In this paper, we propose a task-aware dual-representation network (TADRNet) for few-shot action recognition, which learns how to adapt video representations to novel tasks in a meta-learning manner. It mainly includes a global relational graph subnetwork (GRG) and a fine-grained local representation subnetwork (FLR). Our method simultaneously considers both global and local characteristics of samples for few-shot action recognition. From a global perspective, we propose GRG to explore the relations across support-query sample pairs by using the relational graph neural network. To facilitate the few-shot visual learning, we propose a novel hybrid semantic attention module (HSA) for enhancing the discriminability of support and query features. From a local perspective, we utilize FLR to fully exploit the local characteristics of samples, which can improve the classification results obtained by GRG and thus guarantee high classification accuracy. Extensive experiments on four challenging benchmarks show that the proposed TADRNet significantly outperforms a variety of state-of-the-art few-shot action recognition methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call