Abstract

In this paper, we propose a method to recognize human actions independently of viewpoints by developing 4D space–time features which can generalize the information from a finite number of views in training phase so as to show a satisfactory performance in arbitrary testing views. This 4D space–time interest points (4D-STIPs, [x,y,z,t]) are extracted using 3D space volumes reconstructed from images of a finite number of different views. Since the proposed features are constructed using volumetric information, the features for arbitrary 2D space viewpoint in testing can be generated by projecting the 3D space volumes and 4D-STIPs on corresponding test image planes. This enables action recognition in any camera viewpoint even after training with images from only a finite number of views. We also propose the variant of 3D space–time interest points, which take into account the simultaneous gradient variation in all 3 dimensions to focus on the motion of important spatial corner points. 3D space volumes and 4D-STIPs can be projected to arbitrary viewpoints for training each action to get generalization capability of the classifier. With these projected features, we construct motion history images and non-motion history images which encode the moving and non-moving parts of an action respectively. After reducing the feature dimension, the final features are learned by support vector data description method. In experiments, we train the models using IXMAS dataset constructed from five views and test them with a new SNU dataset made for evaluating the generalization performance for arbitrary view videos.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call