Abstract

Automatic estimation of 3D shape similarity from video is a very important factor for human action analysis, but also a challenging task due to variations in body topology and the high dimensionality of the pose configuration space. We consider the problem of 3D shape similarity in 3D video sequence for different actors and motions. Most current approaches use conventional global features as a shape descriptor and define the shape similarity using L2 distance. However, such methods are limited to coarse representation and do not sufficiently reflect the pose similarity of human perception. In this paper, we present a novel 3D human pose descriptor called Extremal Human Curves (EHC), extracted from both the spatial and the topological dimensions of body surface. To compare tow shapes, we use an elastic metric in Shape Space between their descriptors, based on static features, and then perform temporal convolutions, thereby capturing the pose information encoded in multiple adjacent frames. We quantitatively analyze the effectiveness of our descriptors for both 3D shape similarity in video and content-based pose retrieval for static shape, and show that each one can contribute, sometimes substantially, to more reliable human shape and pose analysis. Experimental results are promising and show the robustness and accuracy of the proposed approach by comparing the recognition performance against several state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call