Abstract

Reconstructing dynamic scenes with commodity depth cameras has many applications in computer graphics, computer vision, and robotics. However, due to the presence of noise and erroneous observations from data capturing devices and the inherently ill-posed nature of non-rigid registration with insufficient information, traditional approaches often produce low-quality geometry with holes, bumps, and misalignments. We propose a novel 3D dynamic reconstruction system, named HDR-Net-Fusion, which learns to simultaneously reconstruct and refine the geometry on the fly with a sparse embedded deformation graph of surfels, using a hierarchical deep reinforcement (HDR) network. The latter comprises two parts: a global HDR-Net which rapidly detects local regions with large geometric errors, and a local HDR-Net serving as a local patch refinement operator to promptly complete and enhance such regions. Training the global HDR-Net is formulated as a novel reinforcement learning problem to implicitly learn the region selection strategy with the goal of improving the overall reconstruction quality. The applicability and efficiency of our approach are demonstrated using a large-scale dynamic reconstruction dataset. Our method can reconstruct geometry with higher quality than traditional methods.

Highlights

  • While research on reconstructing and modeling static indoor scenes [8] has matured in the past few years, reconstructing dynamic objects still remains an open problem in both the graphics and robotics communities

  • The seminal work of DynamicFusion [14] described a general pipeline adopted by many other algorithms: by parameterizing the per-frame deformation as a warp field defined on a sparse set of transformation nodes skinned from the full geometry, the underlying shape can be registered to the depth observations at a particular time by solving a nonrigid iterative closest point (NR-ICP) problem [15], yielding the transformation for every node

  • In order to maintain system efficiency while exploiting the power of deep learning models, we present HDR-NetFusion, a highly-efficient dynamic 3D reconstruction system based on surfel representation [30], which can reconstruct and refine dynamic scenes simultaneously with a hierarchical deep reinforcement network, called HDR-Net

Read more

Summary

Introduction

While research on reconstructing and modeling static indoor scenes [8] has matured in the past few years, reconstructing dynamic objects (e.g., humans, animals, and other freely moving objects) still remains an open problem in both the graphics and robotics communities (referred to as dynamic SLAM [9,10,11,12,13]). Reconstructing scene dynamics is inherently an ill-posed problem because the solution space for occluded regions which are not observed by any camera can be infinitely large [21]. Various regularization terms, such as as-rigid-as-possible constraints, are used to tackle this problem to some extent, but they are not always appropriate to realworld scenarios.

Dynamic reconstruction
Point set deep networks
Deep reinforcement learning
Overview
Notation and scene representation
Design
Challenges
Energy function
Concept
Network architecture
Loss function
Problem formulation
Learning algorithm and network architecture
Experimental setup
Training protocol
Parameter selection
Overall performance
Global-HDR-Net comparison
Local-HDR-Net comparison
Generalization
Limitations
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.