Abstract

Existing approaches for still image based action recognition rely heavily on bounding boxes and could be restricted to specific applications with bounding boxes available. Thus, exploring the boxless action recognition in still images is very challenging for lack of any supervised knowledge. To address this issue, we propose an attention focused spatial pyramid pooling (SPP) network (AttSPP-net) free from the bounding boxes by jointly integrating the soft attention mechanism and SPP into a convolutional neural network. Particularly, soft attention mechanism automatically indicates relevant image regions to be an action. Besides, AttSPP-net further exploits SPP to boost the robustness to action deformation by capturing spatial structures among image pixels. Experiments on two public action recognition benchmark datasets including PASCAL VOC 2012 and Stanford-40 demonstrate that AttSPP-net can achieve promising results and even outweighs some methods based on ground-truth bounding boxes, and provides an alternative way towards practical applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call