Abstract

Depth images can be easily acquired using depth cameras. However, these images only contain partial information about the shape due to unavoidable self-occlusion. Thanks to the availability of large datasets of shapes, it is possible to use a learning-based approach to produce complete shapes from single depth images. State-of-the-art generative adversarial network (GAN) architectures can produce reasonable results. However, the use of relatively local convolutions restricts GAN architectures from producing globally plausible shapes. In this study, we develop a novel dynamic latent code selection mechanism in which the model learns to select only important codes from the latent space. Furthermore, a novel 3D self-attention (3DSA) layer is introduced that is able to capture non-local relationships across the 3D space. We further design a GAN architecture that uses a multistage encoder–decoder to recover the shape, where our 3DSA layer is introduced to the discriminator to help attend to global features, which stabilizes the model learning and encourages shape refinement, making our reconstruction more structurally plausible. Through extensive experiments, we demonstrate that our method outperforms other state-of-the-art methods for single depth image 3D reconstruction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call