Abstract
Recent methods based on midlevel visual concepts have shown promising capabilities in the human action recognition field. Automatically discovering semantic entities such as action parts remains challenging. In this paper, we present a method of automatically discovering distinctive midlevel action parts from video for recognition of human actions. We address this problem by learning and selecting a collection of discriminative and representative action part detectors directly from video data. We initially train a large collection of candidate exemplar-linear discriminant analysis detectors from clusters obtained by clustering spatiotemporal patches in whitened space. To select the most effective detectors from the vast array of candidates, we propose novel coverage-entropy curves (CE curves) to evaluate a detector's capability of distinguishing actions. The CE curves characterize the correlation between the representative and discriminative power of detectors. In the experiments, we apply the mined part detectors as a visual vocabulary to the task of action recognition on four datasets: KTH, Olympic Sports, UCF50, and HMDB51. The experimental results demonstrate the effectiveness of the proposed method and show the state-of-the-art recognition performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of the Optical Society of America. A, Optics, image science, and vision
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.