Analysis of the PointNet neural network architecture

L A Shchenyavskaya,D A Gura,R A Dyachenko

doi:10.21822/2073-6185-2023-50-4-158-165

Abstract

Objective. Most researchers convert point cloud data into ordinary three-dimensional voxel grids or image collections, which makes the data unnecessarily voluminous and causes problems when processing them. The purpose of the study is to analyze the architecture of the PointNet neural network. Method. A unified approach has been applied to solving various 3D recognition problems, ranging from object classification, detail segmentation to semantic scene analysis. Result. A comparative analysis of the classification of 2d and 3d objects was carried out, the layers and functions through which classification occurs were studied in detail. A type of neural network is considered that directly uses point clouds, which takes into account the invariance of permutations of points in the input data. The network is determined to provide a unified architecture for applications ranging from object classification, part segmentation, and scene semantics. For semantic segmentation, the input data can be either a single object from the part area segmentation or a small part of the 3D scene. A neural network that is widely used for raster image editing, graphic design, and digital art is a deep point cloud architecture called PointNet. Conclusion. A new deep point cloud architecture, PointNet, is introduced. For object classification task, the input point cloud is directly selected from the shape or pre-segmented from the scene point cloud. To obtain a virtual model of the real world, neural network solutions are used, based on the assumption that there is an RGB point cloud obtained by an RGB-D camera from one or several angles.

Full Text