Abstract

We introduce HUMAN4D, a large and multimodal 4D dataset that contains a variety of human activities simultaneously captured by a professional marker-based MoCap, a volumetric capture and an audio recording system. By capturing 2 female and $2$ male professional actors performing various full-body movements and expressions, HUMAN4D provides a diverse set of motions and poses encountered as part of single- and multi-person daily, physical and social activities (jumping, dancing, etc.), along with multi-RGBD (mRGBD), volumetric and audio data. Despite the existence of multi-view color datasets captured with the use of hardware (HW) synchronization, to the best of our knowledge, HUMAN4D is the first and only public resource that provides volumetric depth maps with high synchronization precision due to the use of intra- and inter-sensor HW-SYNC. Moreover, a spatio-temporally aligned scanned and rigged 3D character complements HUMAN4D to enable joint research on time-varying and high-quality dynamic meshes. We provide evaluation baselines by benchmarking HUMAN4D with state-of-the-art human pose estimation and 3D compression methods. For the former, we apply 2D and 3D pose estimation algorithms both on single- and multi-view data cues. For the latter, we benchmark open-source 3D codecs on volumetric data respecting online volumetric video encoding and steady bit-rates. Furthermore, qualitative and quantitative visual comparison between mesh-based volumetric data reconstructed in different qualities showcases the available options with respect to 4D representations. HUMAN4D is introduced to the computer vision and graphics research communities to enable joint research on spatio-temporally aligned pose, volumetric, mRGBD and audio data cues. The dataset and its code are available https://tofis.github.io/myurls/human4d.

Highlights

  • Inhabitance in a 4D world of moving 3D objects of various shapes and colors increases the need to capture and extensively study, analyze and exploit the 4D data around us, especially with the massive development of low-cost sensingThe associate editor coordinating the review of this manuscript and approving it for publication was Nilanjan Dey.devices [1]

  • To the outcomes on other public datasets, AlphaPose outperforms OpenPose showing higher accuracy both in single- and multi-person benchmarking sets of HUMAND. Even though both methods showcase lower accuracy on the multi-person data of H4D2, which is much more challenging due to the occlusions between the subjects, it is worth noting that the difference between the single- and multi-person results of OpenPose is low (∼ 1.5%), while AlphaPose presents a higher drop of approximately 9%

  • In order to provide extra information to the reader, along with the results on HUMAN4D, we indicate the related outcomes of the methods to other datasets, i.e. MPII [42] and COCO [56]

Read more

Summary

Introduction

Inhabitance in a 4D world of moving 3D objects of various shapes and colors increases the need to capture and extensively study, analyze and exploit the 4D data around us, especially with the massive development of low-cost sensingThe associate editor coordinating the review of this manuscript and approving it for publication was Nilanjan Dey.devices [1]. Volumetric video of humans, captured with the aid of multiple cameras, and scanned 3D characters, animated with the use of motion capture (MoCap) technologies, comprise the core elements for human-centric 4D media production, a domain essential in several technological and industrial sectors. These technologies constitute key elements in immersive experiences that provide remote. A. Chatzitofis et al.: HUMAN4D: A Human-Centric Multimodal Dataset for Motions and Immersive Media virtual presence and co-presence (e.g. XR conferencing [2], XR museums [3], etc.). The experiences are further enhanced by augmenting the virtual and immersive worlds with photorealistic representations that enable highly natural and realistic audiovisual communication between multiple users

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.