Abstract

Recently, 3D Convolutional Neural Network (3D-CNN) models with attention mechanisms have been widely studied in action recognition tasks. Although most of these methods explore the spatial, temporal and channel attention for action recognition, the inter-correlations over spatial, temporal and channel are not fully exploited. In this paper, we introduce a novel inter-dimensional correlations aggregated attention (ICAA) network that extracts inter-correlations between two dimensions in spatial, temporal and channel, and inter-spatial-temporal-channel correlations to obtain more comprehensive correlations. The proposed ICAA module can be wrapped as a generic module easily plugged into the state-of-the-art 3D-CNN models as well as multi-stream architectures for video action recognition. We extensively evaluate our method on action recognition tasks over UCF-101 and HMDB-51 datasets, and the experimental results demonstrate that adding our ICAA module can obtain state-of-the-art performance on UCF-101 and HMDB-51, which has the performance of 98.4% and 81.9% respectively, and achieve significant improvement compared against original models.

Highlights

  • Action recognition in the video has been extensively investigated in computer vision, owing to its great potential in a wide range of applications such as intelligent surveillance, human-computer interaction, robotics, and healthcare

  • In order to model the temporal information and motion patterns, 3D Convolutional Neural Network (3D-CNN) model is developed for action recognition which generates multiple channels of information from the input frames, and the final feature representation combines information from all channels [4]

  • We extensively evaluate our module on action recognition tasks over two popular datasets (UCF-101 and HMDB-51) and the experimental results demonstrate that our inter-dimensional correlations aggregated attention (ICAA) module can obtain considerably improved performance compared against original models and achieve stateof-the-art performance for the action recognition task

Read more

Summary

INTRODUCTION

Action recognition in the video has been extensively investigated in computer vision, owing to its great potential in a wide range of applications such as intelligent surveillance, human-computer interaction, robotics, and healthcare. Zheng et al [14] propose a novel global and local knowledge-aware attention network that incorporates two attention models and a global pooling (GP) model to make full use of their implicit complementary advantages, while GP models capture global information and attention models focus on the significant details Taking these into account, we conjecture that the aggregated inter-dimensional correlations attention can enhance interpretability and provide rationales to recognize and explain the action. Hu et al [15] introduce a Squeeze-and-Excitation module to exploit the inter-channel relationship, where it squeezes global spatial information into a channel descriptor by using global average pooling to generate channel-wise statistics Motivated by these facts, we propose a novel interdimensional correlations aggregated attention networks, which can make full use of the inter-correlations information for action recognition. We extensively evaluate our module on action recognition tasks over two popular datasets (UCF-101 and HMDB-51) and the experimental results demonstrate that our ICAA module can obtain considerably improved performance compared against original models and achieve stateof-the-art performance for the action recognition task

RELATED WORK
INTER-DIMENSIONAL CORRELATIONS AGGREGATED
SPATIAL AND CHANNEL ATTENTION SUB-MODULE
EXPERIMENTS
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.