Abstract

In recent years, learning-based approaches for 3D reconstruction have gained much popularity due to their encouraging results. However, unlike 2D images, 3D cannot be represented in its canonical form to make it computationally lean and memory-efficient. Moreover, the generation of a 3D model directly from a single 2D image is even more challenging due to the limited details available from the image for 3D reconstruction. Existing learning-based techniques still lack the desired resolution, efficiency, and smoothness of the 3D models required for many practical applications. In this paper, we propose voxel-based 3D object reconstruction (V3DOR) from a single 2D image for better accuracy, one using autoencoders (AE) and another using variational autoencoders (VAE). The encoder part of both models is used to learn suitable compressed latent representation from a single 2D image, and a decoder generates a corresponding 3D model. Our contribution is twofold. First, to the best of the authors’ knowledge, it is the first time that variational autoencoders (VAE) have been employed for the 3D reconstruction problem. Second, the proposed models extract a discriminative set of features and generate a smoother and high-resolution 3D model. To evaluate the efficacy of the proposed method, experiments have been conducted on a benchmark ShapeNet data set. The results confirm that the proposed method outperforms state-of-the-art methods.

Highlights

  • In recent years, imaging devices such as cameras have become common, and people have easy access to these devices; most of these devices can only capture the scene in 2D format

  • The proposed voxel-based 3D object reconstruction (V3DOR) approach consists of two different architectures, i.e., autoencoder (AE) and variational autoencoder (VAE)

  • 3D-VAEN approach, two encoded vectors of mean and standard deviation are computed from input in encoding phase

Read more

Summary

Introduction

In recent years, imaging devices such as cameras have become common, and people have easy access to these devices; most of these devices can only capture the scene in 2D format. Implicit volumetric reconstruction or explicit mesh-based techniques were used for 3D reconstruction In both cases, a large amount of input data and mathematical knowledge are required to estimate sufficient geometrical properties [4]. The first generation learns the 3D to 2D image projection process by utilizing the mathematical and geometrical information using some mathematical or algorithmic solution These types of solutions usually require multiple images that are captured using specially calibrated cameras. The second generation of 2D to 3D model conversion utilizes the accurately segmented 2D silhouettes This generation leads to a reasonable 3D model generation, but it requires specially designed calibrated cameras to capture the image of the same object from every different angle. This type of technique is not feasible or more practical because of the complex image capturing techniques [10,17]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.