Attention Focused Spatial Pyramid Pooling for Boxless Action Recognition in Still Images

Weijiang Feng,Xuhui Huang,Xiang Zhang,Zhigang Luo

doi:10.1007/978-3-319-68612-7_65

Abstract

Existing approaches for still image based action recognition rely heavily on bounding boxes and could be restricted to specific applications with bounding boxes available. Thus, exploring the boxless action recognition in still images is very challenging for lack of any supervised knowledge. To address this issue, we propose an attention focused spatial pyramid pooling (SPP) network (AttSPP-net) free from the bounding boxes by jointly integrating the soft attention mechanism and SPP into a convolutional neural network. Particularly, soft attention mechanism automatically indicates relevant image regions to be an action. Besides, AttSPP-net further exploits SPP to boost the robustness to action deformation by capturing spatial structures among image pixels. Experiments on two public action recognition benchmark datasets including PASCAL VOC 2012 and Stanford-40 demonstrate that AttSPP-net can achieve promising results and even outweighs some methods based on ground-truth bounding boxes, and provides an alternative way towards practical applications.

Full Text