Abstract

Template warping is a popular technique in vision-based 3D motion tracking and 3D pose estimation due to its flexibility of being applicable to monocular video sequences. However, the method suffers from two major limitations that hamper its successful use in practice. First, it requires the camera to be calibrated prior to applying the method. Second, it may fail to provide good results if the inter-frame displacements are too large. To overcome the first problem, we propose to estimate the unknown focal length of the camera from several initial frames by an iterative optimization process. To alleviate the second problem, we propose a tracking method based on combining complementary information provided by dense optical flow and tracked scale-invariant feature transform (SIFT) features. While optical flow is good for small displacements and provides accurate local information, tracked SIFT features are better at handling larger displacements or global transformations. To combine these two pieces of complementary information, we introduce a forgetting factor to bootstrap the 3D pose estimates provided by SIFT features, and refine the final results using optical flow. Experiments are performed on three public databases, i.e., the Biwi Head Pose dataset, the BU dataset, and the McGill Faces datasets. The results illustrate that the proposed solution provides more accurate results than baseline methods that rely solely on either template warping or SIFT features. In addition, the approach can be applied in a larger variety of scenarios, due to circumventing the need for camera calibration, thus providing a more flexible solution to the problem than existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call