Abstract

Learning multi-scale spatial features from 3D spatial geometric representations of objects such as point clouds, 3D CAD models, surfaces, and RGB-D data can potentially improve object recognition accuracy. Current deep learning approaches learn such features using structured data representations such as volume occupancy grids (voxels) and octrees or unstructured representations such as graphs and point clouds. Structured representations are generally restricted by their inherent limitations on the resolution, such as the voxel grid dimensions or the maximum octree depth. At the same time, it is challenging to learn directly from unstructured representations of 3D data due to non-uniformity among the samples. A hierarchical approach that maintains the structure at a larger scale while still accounting for the details at a smaller scale in specific spatial locations can provide an optimal solution for learning from 3D data. In this paper, we propose a multi-level learning approach to capture large-scale features at a coarse level (for example, using a coarse voxelization) while simultaneously capturing flexible sparse information of the small-scale features at a fine level (for example, a local fine-level voxel grid) at different spatial locations. To demonstrate the utility of the proposed multi-resolution learning, we use a multi-level voxel representation of CAD models to perform object recognition. The multi-level voxel representation consists of a coarse voxel grid containing volumetric information of the 3D objects and multiple fine-level voxel grids corresponding to each voxel in the coarse grid containing a portion of the object boundary. In addition, we develop an interpretability-based feedback approach to transfer saliency information from one level of features to another in our hierarchical end-to-end learning framework. Finally, we demonstrate the performance of our multi-resolution learning algorithm for object recognition. We outperform several previously published benchmarks for object recognition while using significantly less memory during training.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.