Abstract

3D scene sensing and understanding is a fundamental task in the field of computer vision and robotics. One widely used representation for 3D data is a voxel grid. However, explicit representation of 3D voxels always requires large storage space, which is not suitable for light-weight applications and scenarios such as robotic navigation and exploration. In this paper we propose a method to compress 3D voxel grids using an octree representation and Variational Autoencoders (VAEs). We first capture a 3D voxel grid –in our application with collaborating Realsense D435 and T265 cameras. The voxel grid is decomposed into three types of octants which are then compressed by the encoder and reproduced by feeding the latent code into the decoder. We demonstrate the efficiency of our method by two applications: scene reconstruction and path planing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call