Abstract

This paper presents a robust computational framework for monocular 3D tracking of human movement. The main innovation of the proposed framework is to explore the underlying data structures of the body silhouette and pose spaces by constructing low-dimensional silhouettes and poses manifolds, establishing intermanifold mappings, and performing tracking in such manifolds using a particle filter. In addition, a novel vectorized silhouette descriptor is introduced to achieve low-dimensional, noise-resilient silhouette representation. The proposed articulated motion tracker is view-independent, self-initializing, and capable of maintaining multiple kinematic trajectories. By using the learned mapping from the silhouette manifold to the pose manifold, particle sampling is informed by the current image observation, resulting in improved sample efficiency. Decent tracking results have been obtained using synthetic and real videos.

Highlights

  • Reliable recovery and tracking of articulated human motion from video are considered a very challenging problem in computer vision, due to the versatility of human movement, the variability of body types, various movement styles and signatures, and the 3D nature of human body

  • To take a close look at the smoothness of the three shape descriptors, original Gaussian mixture models (GMM), vectorized GMM, and shape context, we examine the resulting manifolds after dimension reduction and dynamic learning using Gaussian process dynamic models (GPDM)

  • For each view and each method, given an input silhouette, we found the smallest root mean square errors (RMSEs) among all of the 15 candidate poses provided by Bayesian mixture of experts (BME)

Read more

Summary

Introduction

Reliable recovery and tracking of articulated human motion from video are considered a very challenging problem in computer vision, due to the versatility of human movement, the variability of body types, various movement styles and signatures, and the 3D nature of human body. When training data is available, the articulated motion tracking can be cast into a statistical learning and inference problem. Generative-based approaches, for example [2,3,4], usually assume the knowledge of a 3D body model of the subject and dynamical models of the related movement, from which kinematic predictions and corresponding image observations can be generated. Generative-based methods utilize movement dynamics and produce more accurate tracking results, they are more time consuming, and usually the conditional distribution of the kinematics given the current image observation is not utilized directly. EURASIP Journal on Image and Video Processing discriminative-based methods learn such conditional distributions of kinematics given image observations from training data and often result in fast image-based kinematic inference. The rich temporal correlation of body kinematics between adjacent frames is unused in tracking

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.