Actionness-pooled Deep-convolutional Descriptor for fine-grained action recognition

Tingting Han,Hongxun Yao,Xiaoshuai Sun,Wenlong Xie,Sicheng Zhao,Wei Yu

doi:10.1016/j.neucom.2019.03.099

Abstract

Recognition of general actions has witnessed great success in recent years. However, the existing general action representations cannot work well to recognize fine-grained actions, which usually share high similarities in both appearance and motion pattern. To solve this problem, we introduce the visual attention mechanism into the proposed descriptor, termed Actionness-pooled Deep-convolutional Descriptor (ADD). Instead of pooling features uniformly from the entire video, we aggregate features in sub-regions that are more likely to contain actions according to actionness maps. This endows ADD with the superior capability of capturing the subtle differences between fine-grained actions. We conduct experiments on HIT Dances dataset, one of the few existing datasets for fine-grained action analysis. Quantitative results have demonstrated that ADD remarkably outperforms traditional CNN-based representations. Extensive experiments on two general action benchmarks, JHMDB and UCF101, have additionally proved that combining ADD with end-to-end ConvNet can further boost the recognition performance. Besides, taking advantage of ADD, we reveal the sparsity characteristic existing in actions and point out a potential direction to design more effective action analysis models by extracting both representative and discriminative action patterns.

Full Text