Abstract

Abstract Human activity prediction aims to recognize an unfinished activity with limited appearance and motion information. In this paper, we propose to predict an incomplete activity by combining the mid-level action units and the discriminative key frames exploited from each activity class. Specifically, we extract a great deal of action-related volumes from activity videos. Based on a set of low-level powerful features, similar volumes are aggregated into a mid-level feature, named action unit. Then, we detect these action units in each activity video and generate the frame feature by computing the distribution of concurrent action units in a single frame. Notice that human can easily recognize an incomplete activity using scanty key frames composed of representative interrelated action units together. The key frames in each activity class are selected by computing the entropy of each single frame feature. Finally, a structured SVM is trained to recognize activities with different observation ratios. The proposed approach is evaluated on several publicly available datasets in comparison with state-of-the-art approaches. The experimental results and analysis clearly demonstrate the effectiveness of the proposed approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call