Abstract
Egocentric vision has gained increasing popularity recently, opening new avenues for human-centric applications. However, the use of the egocentric fisheye cameras allows wide angle coverage but image distortion is introduced along with strong human body self-occlusion imposing significant challenges in data processing and model reconstruction. Unlike previous work only leveraging synthetic data for model training, this paper presents a new real-world EgoCentric Human Pose (ECHP) dataset. To tackle the difficulty of collecting 3D ground truth using motion capture systems, we simultaneously collect images from a head-mounted egocentric fisheye camera as well as from two third-person-view cameras, circumventing the environmental restrictions. By using self-supervised learning under multi-view constraints, we propose a simple yet effective framework, namely EgoFish3D, for egocentric 3D pose estimation from a single image in different real-world scenarios. The proposed EgoFish3D incorporates three main modules. 1) <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">The third-person-view module</i> takes two exocentric images as input and estimates the 3D pose represented in the third-person camera frame; 2) <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">the egocentric module</i> predicts the 3D pose in the egocentric camera frame; and 3) <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">the interactive module</i> estimates the rotation matrix between the third-person and the egocentric views. Experimental results on our ECHP dataset and existing benchmark datasets demonstrate the effectiveness of the proposed EgoFish3D, which can achieve superior performance to existing methods.
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have