Abstract

Recent methods based on midlevel visual concepts have shown promising capabilities in the human action recognition field. Automatically discovering semantic entities such as action parts remains challenging. In this paper, we present a method of automatically discovering distinctive midlevel action parts from video for recognition of human actions. We address this problem by learning and selecting a collection of discriminative and representative action part detectors directly from video data. We initially train a large collection of candidate exemplar-linear discriminant analysis detectors from clusters obtained by clustering spatiotemporal patches in whitened space. To select the most effective detectors from the vast array of candidates, we propose novel coverage-entropy curves (CE curves) to evaluate a detector's capability of distinguishing actions. The CE curves characterize the correlation between the representative and discriminative power of detectors. In the experiments, we apply the mined part detectors as a visual vocabulary to the task of action recognition on four datasets: KTH, Olympic Sports, UCF50, and HMDB51. The experimental results demonstrate the effectiveness of the proposed method and show the state-of-the-art recognition performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.