Action recognition through discovering distinctive action parts.

Feifei Chen,Xiaoqin Kuang,Haitao Gan,Nong Sang,Changxin Gao

doi:10.1364/josaa.32.000173

Abstract

Recent methods based on midlevel visual concepts have shown promising capabilities in the human action recognition field. Automatically discovering semantic entities such as action parts remains challenging. In this paper, we present a method of automatically discovering distinctive midlevel action parts from video for recognition of human actions. We address this problem by learning and selecting a collection of discriminative and representative action part detectors directly from video data. We initially train a large collection of candidate exemplar-linear discriminant analysis detectors from clusters obtained by clustering spatiotemporal patches in whitened space. To select the most effective detectors from the vast array of candidates, we propose novel coverage-entropy curves (CE curves) to evaluate a detector's capability of distinguishing actions. The CE curves characterize the correlation between the representative and discriminative power of detectors. In the experiments, we apply the mined part detectors as a visual vocabulary to the task of action recognition on four datasets: KTH, Olympic Sports, UCF50, and HMDB51. The experimental results demonstrate the effectiveness of the proposed method and show the state-of-the-art recognition performance.

Full Text