In this paper, we propose a novel approach for egocentric 3D human pose estimation using fisheye images captured by a head-mounted display (HMD). Most studies on 3D pose estimation focused on heatmap regression and lifting 2D information to 3D space. This paper addresses the issue of depth ambiguity with highly distorted 2D fisheye images by proposing the SegDepth module, which jointly regresses segmentation and depth maps from the image. The SegDepth module distinguishes the human silhouette, which is directly related to pose estimation through segmentation, and simultaneously estimates depth to resolve the depth ambiguity. The extracted segmentation and depth information are transformed into embeddings and used for 3D joint estimation. In the evaluation, the SegDepth module improves the performance of existing methods, demonstrating its effectiveness and general applicability in improving 3D pose estimation. This suggests that the SegDepth module can be integrated into well-established methods such as Mo2Cap2 and xR-EgoPose to improve 3D pose estimation and provide a general performance improvement.
Read full abstract