Multi-scale Spatial-Temporal Attention for Action Recognition

Qing Zhang,Lingfeng Wang,Hongping Yan

doi:10.1007/978-3-030-31654-9_3

Abstract

In this paper, we propose a new attention model by integrating multi-scale features to recognize human action. We introduce multi-scale features through different sizes of convolution kernel on both spatial and temporal fields. The spatial attention model considers the relationship between detail and integral of the human action, therefore our model can focus on the significant part of the action on the spatial field. The temporal attention model considers the speed of action, in order that our model can concentrate on the pivotal clips of the action on the temporal field. We verify the validity of multi-scale features in the benchmark action recognition datasets, including UCF-101 (\(88.8\%\)), HMDB-51 (\(60.0\%\)) and Penn (\(96.3\%\)). As a result that the accuracy of our model outperforms the previous methods.

Full Text