Abstract
In the digital age, with the continuous emergence of large-scale video data, video understanding has become increasingly important. As a core domain, action recognition has garnered widespread attention. However, video exhibits high-dimensional properties and contains human action information at multiple scales, which makes conventional attention mechanisms difficult to capture complex action information. To improve the performance of action recognition, a Hybrid Attention-guided ConvNeXt-GRU Network (HACG) is proposed. Specifically, a Novel Attention Mechanism (ANM) is constructed by integrating a parameter-free attention module into ConvNeXt, enabling the preliminary extraction of important features without the addition of extra parameters. Then, a Multiscale Hybrid Attention Module (MHAM) adopts an improved and efficient Selective Kernel Network (SKNet) to adaptively calibrate channel features. In this way, the module enhances the model’s ability to perceive features at different scales while improving the correlation between channels. Furthermore, MHAM incorporates an Atrous Spatial Pyramid Pooling (ASPP) to extract local and global information from different regions. Finally, MHAM is integrated with the Gated Recurrent Unit (GRU) to capture the interdependence between space and time. Experimental results show that HACG exhibits superior competitiveness compared with the state-of-the-art on the UCF-101, HMDB-51, and Kinetics-400 datasets. This indicates that HACG can more effectively capture important features to suppress noise interference while also having a lower computational load, which makes HACG a highly promising choice for action recognition tasks.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Engineering Applications of Artificial Intelligence
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.