Hybrid attentive prototypical network for few-shot action recognition

Zanxi Ruan,Yuxiang Xie,Yingmei Wei,Yanming Guo

doi:10.1007/s40747-024-01571-4

Abstract

Most previous few-shot action recognition works tend to process video temporal and spatial features separately, resulting in insufficient extraction of comprehensive features. In this paper, a novel hybrid attentive prototypical network (HAPN) framework for few-shot action recognition is proposed. Distinguished by its joint processing of temporal and spatial information, the HAPN framework strategically manipulates these dimensions from feature extraction to the attention module, consequently enhancing its ability to perform action recognition tasks. Our framework utilizes the R(2+1)D backbone network, coupling the extraction of integrated temporal and spatial features to ensure a comprehensive understanding of video content. Additionally, our framework introduces the novel Residual Tri-dimensional Attention (ResTriDA) mechanism, specifically designed to augment feature information across the temporal, spatial, and channel dimensions. ResTriDA dynamically enhances crucial aspects of video features by amplifying significant channel-wise features for action distinction, accentuating spatial details vital for capturing the essence of actions within frames, and emphasizing temporal dynamics to capture movement over time. We further propose a prototypical attentive matching module (PAM) built on the concept of metric learning to resolve the overfitting issue common in few-shot tasks. We evaluate our HAPN framework on three classical few-shot action recognition datasets: Kinetics-100, UCF101, and HMDB51. The results indicate that our framework significantly outperformed state-of-the-art methods. Notably, the 1-shot task, demonstrated an increase of 9.8% in accuracy on UCF101 and improvements of 3.9% on HMDB51 and 12.4% on Kinetics-100. These gains confirm the robustness and effectiveness of our approach in leveraging limited data for precise action recognition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Hybrid attentive prototypical network for few-shot action recognition

Abstract

Talk to us

Similar Papers

More From: Complex & Intelligent Systems

Lead the way for us

Journal: Complex & Intelligent Systems	Publication Date: Aug 19, 2024
License type: CC BY-NC-ND 4.0

Similar Papers

Spatial-temporal interaction learning based two-stream network for action recognition
Tianyu Liu ... Ping Jiang
Information Sciences | VOL. 606
Tianyu Liu, et. al.Tianyu Liu ... Ping Jiang
28 May 2022
Information Sciences | VOL. 606

Online human action recognition with spatial and temporal skeleton features using a distributed camera network
Guoliang Liu ... Qinghui Zhang
International Journal of Intelligent Systems | VOL. 36
Guoliang Liu, et. al.Guoliang Liu ... Qinghui Zhang
08 Aug 2021
International Journal of Intelligent Systems | VOL. 36

A framework for mobile activity recognition
Jiahui Wen
-
Jiahui WenJiahui Wen
22 May 2017
22 May 2017

Multipath Attention and Adaptive Gating Network for Video Action Recognition
Haiping Zhang ... Conghao Ma
Neural Processing Letters | VOL. 56
Haiping Zhang, et. al.Haiping Zhang ... Conghao Ma
27 Mar 2024
Neural Processing Letters | VOL. 56

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hybrid attentive prototypical network for few-shot action recognition

Abstract

Talk to us

Similar Papers

More From: Complex & Intelligent Systems