Abstract

Although prior methods have achieved promising performance for recovering the 3D geometry from a single depth image, they tend to produce incomplete 3D shapes with noise. To this end, we propose Multi-Scale Latent Feature-Aware Network (MLANet) to recover the full 3D voxel grid from a single depth view of an object. MLANet logically represents a 3D voxel grid as visible voxels, occluded voxels and non-object voxels, and aims to the reconstruction of the latter two. Thus MLANet first introduces Multi-Scale Latent Feature-Aware (MLFA) based AutoEncoder (MLFA-AE) and a logical partition module to predict an occluded voxel grid (OccVoxGd) and a non-object voxel grid (NonVoxGd) from the visible voxel grid (VisVoxGd) corresponding to the input. MLANet then introduces MLFA based Generative Adversarial Network (MLFA-GAN) to refine the OccVoxGd and the NonVoxGd, and combines them with the VisVoxGd to generate a target 3D occupancy grid. MLFA shows a strong ability of learning multi-scale features of an object effectively and can be considered as a plug-and-play component to promote existing networks. The logical partition helps suppress NonVoxGd noise and improve OccVoxGd accuracy under adversarial constraints. Experimental studies on both synthetic and real-world data show that MLANet outperforms the state-of-the-art methods, and especially reconstructs unseen object categories with a higher accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call