Abstract

The advancement of deep learning technology has been concentrating on deploying end-to-end solutions using high dimensional data, such as images. Recently, a number of methods have been proposed for reconstructing 3D objects using deep learning. One such technique is the method that involves recovering 3D objects as voxel grids using one or multiple images. However, there has been very little work directed towards the generation of 3D objects represented by a set of points, i.e. point cloud, from voxel grids which are ambiguous and coarse. The development of a deep learning model that generates point clouds with details from coarse voxel grids has numerous benefits to quality of life. For example, design professionals can use this model to generate detailed 3D point clouds using a sketched and coarse voxel to enable their creativities. This paper presents an algorithm to generate point clouds from voxel grids. The algorithm explicitly loads 3D objects into voxels without projection operations and associated information loss. To obtain a comprehensive understanding of the voxel grid, the grid is analyzed through various angles, as inspired by how humans observe 3D objects. The features from various angles are passed into a GRU layer to extract patterns across views, which will then be passed to a channel-wise convolutional layer and graph convolution to generated the predicted point cloud. The experimental result of the algorithm indicates that the algorithm is capable of generating high-quality point clouds by understanding the semantic features of the voxel grid.

Highlights

  • In recent years, computer vision research has been rapidly advanced due to the rise of deep learning technologies

  • We propose a novel end-to-end deep learning framework that synthesizes the detailed 3D point clouds using a low-resolution voxel grid

  • Since the goal is to measure the difference between two point clouds while intersection-overunion works directly on voxel grids, the input point clouds should be converted into a voxel grid

Read more

Summary

Introduction

Computer vision research has been rapidly advanced due to the rise of deep learning technologies. Faster RCNN-based approaches have achieved great success for generic object detection on PASCAL and MS COCO [5]. RGB images in these tasks are a classical visual signal, real-world objects of 3 dimensions usually cannot be comprehensively captured by 2D space. It is necessary and significant to investigate 3D data beyond 2D images. In medical imaging, results from computed tomography scans are usually reconstructed as 3D objects and analyzed as a whole, instead of slices of images [6]. LiDAR and RGB-D camera capture 3D information to localize targets and understand the environment [7]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call