Abstract

In this paper we present an approach for finding space-time activity map in a video shot using 3D moment methods. A RGB-D video involves a specific human activity is first regularly partitioned into multiple video shots in which human activities can be defined. For each video shot, we separate it into multiple video cubes which characterizes local object shape and motion. Given a local video cube, the proposed spacetime pattern detector extracts both the spatial and temporal symmetric information which are further grouped together by hashing to construct an activity map that describes the distribution of motion vectors of objects in a video shot. The intrinsic human activity in a video consisting of multiple shots is finally represented by a set of activity maps. Next, to reduce the temporal dimensionality of an activity in terms of activity maps, the kernel PCA method is applied to transform the activity representation into a set of principal activity maps. Finally, regardless of the activity types of the training videos, all the training principal activity maps are clustered into multiple clusters to generate a principal activity map dictionary. This dictionary is used to solve the initial pose problem when we use dynamic programming to align two sequences of principal activity maps for recognizing human activities in RGB-D videos. The proposed approach was tested using publicly available datasets. Experimental results demonstrate the good performance of the proposed method in terms of activity detection accuracy and execution speed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call