Abstract
We demonstrate how 3D head tracking and pose estimation can be effectively and efficiently achieved from noisy RGB-D sequences. Our proposal leverages on a random forest framework, designed to regress the 3D head pose at every frame in a temporal tracking manner. One peculiarity of the algorithm is that it exploits together (1) a generic training dataset of 3D head models, which is learned once offline; and, (2) an online refinement with subject-specific 3D data, which aims for the tracker to withstand slight facial deformations and to adapt its forest to the specific characteristics of an individual subject. The combination of these works allows our algorithm to be robust even under extreme poses, where the user's face is no longer visible on the image. Finally, we also propose another solution that utilizes a multi-camera system such that the data simultaneously acquired from multiple RGB-D sensors helps the tracker to handle challenging conditions that affect a subset of the cameras. Notably, the proposed multi-camera frameworks yields a real-time performance of approximately 8 ms per frame given six cameras and one CPU core, and scales up linearly to 30 fps with 25 cameras.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.