Abstract

We focus on the problem of recognizing actions in still images, and this paper provides an approach which arranges features of different semantic parts in spatial order. Our approach includes three components: (1) a semantic learning algorithm that collects a set of part detectors, (2) an efficient detection method that divides multiple images by the same grid and evaluates parallelly, and (3) a top-down spatial arrangement that increases the inter-class variance. The proposed semantic parts learning algorithm captures both interactive objects and discriminative poses. Our spatial arrangement can be seen as a kind of adaptive pyramid, which highlights spatial distribution of body parts in different actions, and provides more discriminative representations. Experimental results show that our approach outperforms the state-of-the-art significantly on two challenging benchmarks: (1) PASCAL VOC 2012 and (2) Stanford-40 (by 2.6% mAP and 5.2% mAP, respectively).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.