Abstract
Inferring the three-dimensional structure of objects from monocular images has far-reaching applications in the field of 3D perception. In this paper, we propose a self-supervised network (SSL-Net) to generate 3D point clouds from a single RGB image, unlike the existing work which requires multiple views of the same object to recover the full 3D geometry. To provide the extra self-supervisory signal, the generated 3D model is simultaneously rendered into an image and compared with the input image. In addition, a pose estimation network is integrated into the 3D point cloud generation network to eliminate the pose ambiguity of the input image, and the estimated pose is also used for rendering the 2D image with the same pose as input image from 3D point clouds. The extensive experiments on both real and synthetic datasets show that our method not only qualitatively generates point clouds with more details but also quantitatively outperforms the state-of-the-art in accuracy.
Highlights
3D shape perception is a fundamental theme both in human and computer vision
A pose estimation network is integrated into the 3D point cloud generation network to eliminate the pose ambiguity of the input image, and the estimated pose is used for rendering the 2D image with the same pose as input image from 3D point clouds
EXPERIMENTAL SETUP 1) DATA PREPARATION Shapenet dataset is used to train and evaluate the performance, which is provided by Chang et al [10] and it is a collection of 3D CAD models organized by the cosαcosβ cosαsinβsinγ − sinαcosγ cosαsinβcosγ − sinαsinγ
Summary
The point cloud generation network reconstructs the 3D point clouds from a single image (Section III-A), and the point cloud is transformed into a binary image by a binary image network (Section III-B). The feature is extracted from the input, and the pose information of the image is estimated, thereby obtaining a three-dimensional model of a specific perspective. Parameters with EI2 except the last layer, which uses the fully connected layer to output a six-dimensional vector to represent the image pose information. A pose estimation network is integrated into the 3D point cloud generation network to eliminate the pose ambiguity of the input image, and the estimated pose is used for rendering the 2D image with the same pose as input image from 3D point clouds
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.