Abstract

In this paper, we propose a new attention model by integrating multi-scale features to recognize human action. We introduce multi-scale features through different sizes of convolution kernel on both spatial and temporal fields. The spatial attention model considers the relationship between detail and integral of the human action, therefore our model can focus on the significant part of the action on the spatial field. The temporal attention model considers the speed of action, in order that our model can concentrate on the pivotal clips of the action on the temporal field. We verify the validity of multi-scale features in the benchmark action recognition datasets, including UCF-101 (\(88.8\%\)), HMDB-51 (\(60.0\%\)) and Penn (\(96.3\%\)). As a result that the accuracy of our model outperforms the previous methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call