Abstract

In the digital age, with the continuous emergence of large-scale video data, video understanding has become increasingly important. As a core domain, action recognition has garnered widespread attention. However, video exhibits high-dimensional properties and contains human action information at multiple scales, which makes conventional attention mechanisms difficult to capture complex action information. To improve the performance of action recognition, a Hybrid Attention-guided ConvNeXt-GRU Network (HACG) is proposed. Specifically, a Novel Attention Mechanism (ANM) is constructed by integrating a parameter-free attention module into ConvNeXt, enabling the preliminary extraction of important features without the addition of extra parameters. Then, a Multiscale Hybrid Attention Module (MHAM) adopts an improved and efficient Selective Kernel Network (SKNet) to adaptively calibrate channel features. In this way, the module enhances the model’s ability to perceive features at different scales while improving the correlation between channels. Furthermore, MHAM incorporates an Atrous Spatial Pyramid Pooling (ASPP) to extract local and global information from different regions. Finally, MHAM is integrated with the Gated Recurrent Unit (GRU) to capture the interdependence between space and time. Experimental results show that HACG exhibits superior competitiveness compared with the state-of-the-art on the UCF-101, HMDB-51, and Kinetics-400 datasets. This indicates that HACG can more effectively capture important features to suppress noise interference while also having a lower computational load, which makes HACG a highly promising choice for action recognition tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call