Abstract

Inferring the three-dimensional structure of objects from monocular images has far-reaching applications in the field of 3D perception. In this paper, we propose a self-supervised network (SSL-Net) to generate 3D point clouds from a single RGB image, unlike the existing work which requires multiple views of the same object to recover the full 3D geometry. To provide the extra self-supervisory signal, the generated 3D model is simultaneously rendered into an image and compared with the input image. In addition, a pose estimation network is integrated into the 3D point cloud generation network to eliminate the pose ambiguity of the input image, and the estimated pose is also used for rendering the 2D image with the same pose as input image from 3D point clouds. The extensive experiments on both real and synthetic datasets show that our method not only qualitatively generates point clouds with more details but also quantitatively outperforms the state-of-the-art in accuracy.

Highlights

  • 3D shape perception is a fundamental theme both in human and computer vision

  • A pose estimation network is integrated into the 3D point cloud generation network to eliminate the pose ambiguity of the input image, and the estimated pose is used for rendering the 2D image with the same pose as input image from 3D point clouds

  • EXPERIMENTAL SETUP 1) DATA PREPARATION Shapenet dataset is used to train and evaluate the performance, which is provided by Chang et al [10] and it is a collection of 3D CAD models organized by the cosαcosβ cosαsinβsinγ − sinαcosγ cosαsinβcosγ − sinαsinγ

Read more

Summary

POINT-CLOUD GENERATION NETWORK WITH SELF-SUPERVISED LEANING

The point cloud generation network reconstructs the 3D point clouds from a single image (Section III-A), and the point cloud is transformed into a binary image by a binary image network (Section III-B). The feature is extracted from the input, and the pose information of the image is estimated, thereby obtaining a three-dimensional model of a specific perspective. Parameters with EI2 except the last layer, which uses the fully connected layer to output a six-dimensional vector to represent the image pose information. A pose estimation network is integrated into the 3D point cloud generation network to eliminate the pose ambiguity of the input image, and the estimated pose is used for rendering the 2D image with the same pose as input image from 3D point clouds

EXPERIMENTAL RESULTS
2) EVALUATION METRICS
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.