We envision a telepresence system that enhances remote work by facilitating both physical and immersive visual interactions between individuals. However, during robot teleoperation, communication often lacks realism, as users see the robot’s body rather than the remote individual. To address this, we propose a method for overlaying a digital human model onto a humanoid robot using XR visualization, enabling an immersive 3D telepresence experience. Our approach employs a learning-based method to estimate the 2D poses of the humanoid robot from head-worn stereo views, leveraging a newly collected dataset of full-body poses for humanoid robots. The stereo 2D poses and sparse inertial measurements from the remote operator are optimized to compute 3D poses over time. The digital human is localized from the perspective of a continuously moving observer, utilizing the estimated 3D pose of the humanoid robot. Our moving camera-based pose estimation method does not rely on any markers or external knowledge of the robot’s status, effectively overcoming challenges such as marker occlusion, calibration issues, and dependencies on headset tracking errors. We demonstrate the system in a remote physical training scenario, achieving real-time performance at 40 fps, which enables simultaneous immersive and physical interactions. Experimental results show that our learning-based 3D pose estimation method, which operates without prior knowledge of the robot, significantly outperforms alternative approaches requiring the robot’s global pose, particularly during rapid headset movements, achieving markerless digital human augmentation from head-worn views.
Read full abstract