Impact of Conventional and Deep Learning-based Point Cloud Geometry Coding on Deep Learning-based Classification Performance
Deep learning (DL)-based point cloud (PC) classification is a key computer vision task for many applications, notably autonomous driving, surveillance, and cultural heritage. In many application scenarios, PCs must be coded to reach practical rates for storage and transmission purposes, and thus they suffer from more or less intense compression artifacts. After the specification of two MPEG PC coding standards, DL-based PC coding has gained momentum, reaching competitive compression performance, especially for dense PCs. Since using decoded PCs, which may suffer from compression artifacts, may impact the final classification performance, the main goal of this paper is to study the impact of static PC geometry coding on DL-based classification. This study is performed on the ModelNet40 test dataset using the conventional G-PCC coding standard and the DL-based PC geometry codec which was the top performing solution responding to the recent JPEG Pleno PC Coding Call for Proposals. Two highly performing DL-based classifiers are used, considering the original PC geometry before and after voxelization, as well as the decoded PC geometry for different rates and qualities. As expected, coding has an impact on the classification performance, especially for the lower rates/qualities. For very sparse PCs, conventional coding still has advantage, contrarily to dense PCs, but this should change in the future with DL-based tools becoming the most natural solutions for both PC geometry coding and classification.
- Research Article
105
- 10.1109/jstsp.2020.3047520
- Dec 25, 2020
- IEEE Journal of Selected Topics in Signal Processing
Point clouds are a very rich 3D visual representation model, which has become increasingly appealing for multimedia applications with immersion, interaction and realism requirements. Due to different acquisition and creation conditions as well as target applications, point clouds' characteristics may be very diverse, notably on their density. While geographical information systems or autonomous driving applications may use rather sparse point clouds, cultural heritage or virtual reality applications typically use denser point clouds to more accurately represent objects and people. Naturally, to offer immersion and realism, point clouds need a rather large number of points, thus asking for the development of efficient coding solutions. The use of deep learning models for coding purposes has recently gained relevance, with latest developments in image coding achieving state-of-the-art performance, thus making natural the adoption of this technology also for point cloud coding. This paper presents a novel deep learning-based solution for point cloud geometry coding which is able to efficiently adapt to the content's characteristics. The proposed coding solution divides the point cloud into 3D blocks and selects the most suitable available deep learning coding model to code each block, thus maximizing the compression performance. In comparison to the state-of-the-art MPEG G-PCC Trisoup standard, the proposed coding solution offers average quality gains up to 4.9 and 5.7 dB for PSNR D1 and PSNR D2, respectively.
- Conference Article
7
- 10.1109/euvip53989.2022.9922784
- Sep 11, 2022
Point clouds represent 3D visual data in a very immersive and realistic way, offering to the users a large degree of navigation and interaction. For some key use cases, point clouds are potentially lighter and easier to acquire than other 3D representation models, thus offering an alternative with lower computational cost. To offer visual realistic and immersive experiences, notably the illusion of well-formed surfaces, point clouds typically require a large number of points. To make its storage and transmission feasible, efficient point cloud coding is essential. Recently, deep learning-based point cloud coding solutions have proven to be competitive in compression performance, excelling in distinct scenarios, although struggling to achieve similar results for sparser point clouds and lower coding rates. To tackle these limitations, this paper proposes a double-deep learning-based approach for point cloud coding by integrating a super-resolution tool. The main idea consists on converting sparser point clouds into denser ones via a down-sampling step performed before coding. Since this is a lossy process, a super-resolution step is included after decoding to mitigate the point losses and bringing the point cloud to the initial resolution. Furthermore, the sampling factor can be adaptively selected, thus offering additional flexibility to the point cloud characteristics. The proposed double-deep coding and super-resolution solution outperforms both the G-PCC Octree and V-PCC Intra point cloud coding standards achieving, respectively, 81.9% and 22.3% rate reduction measured as BD-Rate for the PSNR D1 metric.
- Research Article
5
- 10.1109/access.2025.3549316
- Jan 1, 2025
- IEEE Access
Efficient point cloud coding has become increasingly critical for multiple applications such as virtual reality, autonomous driving, and digital twin systems, where rich and interactive 3D data representations may functionally make the difference. Deep learning has emerged as a powerful tool in this domain, offering advanced techniques for compressing point clouds more efficiently than conventional coding methods while also allowing effective computer vision tasks performed in the compressed domain thus, for the first time, making available a common compressed visual representation effective for both man and machine. Taking advantage of this potential, JPEG has recently finalized the JPEG Pleno Learning-based Point Cloud Coding (PCC) standard offering efficient lossy coding of static point clouds, targeting both human visualization and machine processing by leveraging deep learning models for geometry and color coding. The geometry is processed directly in its original 3D form using sparse convolutional neural networks, while the color data is projected onto 2D images and encoded using the also learning-based JPEG AI standard. The goal of this paper is to provide a complete technical description of the JPEG PCC standard, along with a thorough benchmarking of its performance against the state-of-the-art, while highlighting its main strengths and weaknesses. In terms of compression performance, JPEG PCC outperforms the conventional MPEG PCC standards, especially in geometry coding, achieving significant rate reductions. Color compression performance is less competitive but this is overcome by the power of a full learning-based coding framework for both geometry and color and the associated effective compressed domain processing.
- Research Article
14
- 10.1109/tmm.2023.3338081
- Jan 1, 2025
- IEEE Transactions on Multimedia
In this golden age of multimedia, realistic content is in high demand with users seeking more immersive and interactive experiences. As a result, new image modalities for 3D representations have emerged in recent years, among which point clouds have deserved especial attention. Naturally, with this increase in demand, efficient storage and transmission became a must, with standardization groups such as MPEG and JPEG entering the scene, as it happened before with other types of visual media. In a surprising development, JPEG issued a Call for Proposals on point cloud coding targeting exclusively learning-based solutions, in parallel to a similar call for image coding. This is a natural consequence of the growing popularity of deep learning, which due to its excellent performances is currently dominant in the multimedia processing field, including coding. This paper presents the coding solution selected by JPEG as the best-performing response to the Call for Proposals and adopted as the first version of the JPEG Pleno Point Cloud Coding Verification Model, in practice the first step for developing a standard. The proposed solution offers a novel joint geometry and color approach for point cloud coding, in which a single deep learning model processes both geometry and color simultaneously. To maximize the RD performance for a large range of point clouds, the proposed solution uses down-sampling and learning-based super-resolution as pre- and post-processing steps. Compared to the MPEG point cloud coding standards, the proposed coding solution comfortably outperforms G-PCC, for both geometry, color, and joint quality metrics.
- Research Article
10
- 10.1109/lgrs.2022.3141073
- Jan 1, 2022
- IEEE Geoscience and Remote Sensing Letters
Nowadays, the classification of point clouds has become a fundamental problem in 3-D information study. Different from the deep learning process of natural images, 3-D point clouds are massive and unorganized, which can be difficultly captured features by the convolution process directly. This letter proposes a new augmentation convolutional neural network (ACNN) to classify point clouds by adding a key augmentation layer before the classical sampling and convolution structure. Input data will be augmented before each sampling layer, which brings abundant learning information to help the network capture more local structures. In order to make the augmentation more effective, we formulate the parameters of augmentation layers learnable in the learning process according to the loss function. The proposed augmentation is based on automatically tuning the magnitude of the smoothness, which plays a significant role in point cloud processing and provides local features, for example, edges, contours, and edges. Results show that we have achieved the overall accuracy of 92.52% and 89.11% in the object classification on ModelNet10 and ModelNet40, respectively, which shows our superiority over other methods. Besides, the ACNN achieves an average miscalculation error of 0.28 and cross-entropy loss of 0.48 in the classification of laser scanning point clouds, which shows high robustness to noise and density in the outdoor scene classification.
- Preprint Article
- 10.5194/egusphere-egu23-15600
- May 15, 2023
Remotely sensed point clouds provide detailed structural data of landscapes and ecosystem characteristics. Especially in the analysis of forests and topography, this data type has proven its ability to derive relevant quantitative parameters such as biomass or subsidence rates. Arctic and boreal permafrost ecosystems are severely affected by climate change and resulting vegetation shifts, environmental disturbances, and permafrost thaw which lead to rapid changes in these northern environments that can be detected and characterized with point cloud datasets. In recent decades, the amount of point clouds acquired and generated in high-latitude regions by terrestrial (TLS), mobile (MLS), unmanned aerial system (UAS) based (ULS), up to airborne-based (ALS) LiDAR (Light detection and ranging) and Structure from Motion (SfM) has steadily increased. Multi-temporal datasets are available for a wide range of observation targets.The characteristics of the point clouds such as the extent of the area covered as well as the point density and thus the level of detail differ depending on the sensor, method, and the acquisition specifications. To use point cloud data for topographic, morphological, and forestry analysis, segmentation and classification of the point cloud into specific components such as individual trees, stems, foliage, or terrain features is essential. This is a time-consuming manual process and not feasible when addressing large datasets. Several previous analyses showed the potential for machine learning-based semantic segmentation of a single point cloud type, e.g., terrestrial LiDAR (TLS) with identical acquisition mode and sensor. We aim at an automated segmentation of different point cloud types generated by i) TLS, MLS, ULS and ALS as well as ii) SfM using (multi)spectral UAS and airborne image data to enable an analysis of Arctic and boreal permafrost ecosystems. Thereby, we will focus on the following questions:1) How can we reduce the time consuming process of labeling the point clouds?2) Can we train a model for segmentation using all point clouds or does transfer learning lead to better results?3) To what level of detail can we accurately segment and classify the different point cloud types?With this automated segmentation and classification, we aim to open up the possibility of exploiting the information contained in the multitude of point cloud data for a variety of ecological research applications.
- Research Article
5
- 10.3233/jifs-189694
- Jan 1, 2021
- Journal of Intelligent & Fuzzy Systems
Aiming at the problem of automatic classification of point cloud in the investigation of vegetation resources in the straw checkerboard barriers region, an improved random forest point cloud classification algorithm was proposed. According to the problems of decision tree redundancy and absolute majority voting in the existing random forest algorithm, first the similarity of the decision tree was calculated based on the tree edit distance, further clustered reduction based on the maximum and minimum distance algorithm, and then introduced classification accuracy of decision tree to construct weight matrix to implement weighted voting at the voting stage. Before random forest classification, based on the characteristics of point cloud data, a total of 20 point cloud single-point features and multi-point statistical features were selected to participate in point cloud classification, based on the point cloud data spatial distribution characteristics, three different scales for selecting point cloud neighborhoods were set based on the point cloud density, point cloud classification feature sets at different scales were constructed, optimizing important features of point cloud to participate in point cloud classification calculation after variable importance scored. The experimental results showed that the point cloud classification based on the optimized random forest algorithm in this paper achieved a total classification accuracy of 94.15% in dataset 1 acquired by lidar, the overall accuracy of classification on dataset 2 obtained by dense matching reaches 92.03%, both were higher than the unoptimized random forest algorithm and MRF, SVM point cloud classification method, and dimensionality reduction through feature optimization can greatly improve the efficiency of the algorithm.
- Research Article
- 10.1177/18758967251335691
- May 8, 2025
- Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology
In recent years, with the development of technologies such as computer vision, machine learning, and deep learning, as well as the popularity of large-scale data collection devices, 3D point cloud processing has become increasingly important. 3D point cloud processing can be widely used in fields such as object recognition, robot navigation, building information modeling (BIM), and urban planning. With more and more 3D point cloud data acquired, it has become a challenge for present 3D point cloud processing models to accurately and efficiently process this data. To improve the accuracy of point cloud classification and segmentation tasks, this study proposes an improved point cloud classification and segmentation model based on neighborhood aware information fusion. The model includes a Fusion Neighbor Information Feature Enhancement (FNIFE) module, which connects points in the local neighborhood and obtains the features of the current point through the feature relationships between the points in the neighborhood. By enhancing the feature expression of the point, it reduces the feature loss caused by the feature extraction operation and improves the accuracy of point cloud classification. Additionally, the model includes a Reverse Transmission of Point Features (RToPF) module, in which interpolation parameters are adjusted to ensure that the enhanced feature information can be effectively transmitted, thereby improving the accuracy and computing speed of the model. Finally, to further improve classification accuracy further, a module containing the X-Conv operator is utilized in the model to replace the max-pooling in the original network and reduce the feature loss generated during feature extraction. Comparative experiments are conducted on ModelNet40, ShapeNet, S3DIS datasets and ScanNet datasets. The experimental results show that the overall accuracy of proposed model reaches 92.4%. The average accuracy reaches 90.2% in the point cloud classification task, and the average intersection ratio reaches 84.5% in the point cloud segmentation task, achieving superior performance in classification and segmentation tasks compared with the state-of-the-art models.
- Conference Article
21
- 10.1109/mmsp48831.2020.9287060
- Sep 21, 2020
Point clouds are a 3D visual representation format that has recently become fundamentally important for immersive and interactive multimedia applications. Considering the high number of points of practically relevant point clouds, and their increasing market demand, efficient point cloud coding has become a vital research topic. In addition, scalability is an important feature for point cloud coding, especially for real-time applications, where the fast and rate efficient access to a decoded point cloud is important; however, this issue is still rather unexplored in the literature. In this context, this paper proposes a novel deep learning-based point cloud geometry coding solution with resolution scalability via interlaced sub-sampling. As additional layers are decoded, the number of points in the reconstructed point cloud increases as well as the overall quality. Experimental results show that the proposed scalable point cloud geometry coding solution outperforms the recent MPEG Geometry-based Point Cloud Compression standard which is much less scalable.
- Research Article
49
- 10.1007/s11119-021-09803-0
- Mar 26, 2021
- Precision Agriculture
Crop discrimination at the plant or patch level is vital for modern technology-enabled agriculture. Multispectral and hyperspectral remote sensing data have been widely used for crop classification. Even though spectral data are successful in classifying row-crops and orchards, they are limited in discriminating vegetable and cereal crops at plant or patch level. Terrestrial laser scanning is a potential remote sensing approach that offers distinct structural features useful for classification of crops at plant or patch level. The objective of this research is the improvement and application of an advanced deep learning framework for object-based classification of three vegetable crops: cabbage, tomato, and eggplant using high-resolution LiDAR point cloud. Point clouds from a terrestrial laser scanner (TLS) were acquired over experimental plots of the University of Agricultural Sciences, Bengaluru, India. As part of the methodology, a deep convolution neural network (CNN) model named CropPointNet is devised for the semantic segmentation of crops from a 3D perspective. The CropPointNet is an adaptation of the PointNet deep CNN model developed for the segmentation of indoor objects in a typical computer vision scenario. Apart from adapting to 3D point cloud segmentation of crops, the significant methodological improvements made in the CropPointNet are a random sampling scheme for training point cloud, and optimization of the network architecture to enable structural attribute-based segmentation of point clouds of unstructured objects such as TLS point clouds crops. The performance of the 3D crop classification has been validated and compared against two popular deep learning architectures: PointNet, and the Dynamic Graph-based Convolutional Neural Network (DGCNN). Results indicate consistent plant level object-based classification of crop point cloud with overall accuracies of 81% or better for all the three crops. The CropPointNet architecture proposed in this research can be generalized for segmentation and classification of other row crops and natural vegetation types.
- Research Article
13
- 10.3390/ijgi9030182
- Mar 24, 2020
- ISPRS International Journal of Geo-Information
The classification and segmentation of large-scale, sparse, LiDAR point cloud with deep learning are widely used in engineering survey and geoscience. The loose structure and the non-uniform point density are the two major constraints to utilize the sparse point cloud. This paper proposes a lightweight auxiliary network, called the rotated density-based network (RD-Net), and a novel point cloud preprocessing method, Grid Trajectory Box (GT-Box), to solve these problems. The combination of RD-Net and PointNet was used to achieve high-precision 3D classification and segmentation of the sparse point cloud. It emphasizes the importance of the density feature of LiDAR points for 3D object recognition of sparse point cloud. Furthermore, RD-Net plus PointCNN, PointNet, PointCNN, and RD-Net were introduced as comparisons. Public datasets were used to evaluate the performance of the proposed method. The results showed that the RD-Net could significantly improve the performance of sparse point cloud recognition for the coordinate-based network and could improve the classification accuracy to 94% and the segmentation per-accuracy to 70%. Additionally, the results concluded that point-density information has an independent spatial–local correlation and plays an essential role in the process of sparse point cloud recognition.
- Research Article
13
- 10.1109/mmul.2020.3046691
- Dec 22, 2020
- IEEE MultiMedia
As the interest in deep learning tools continues to rise, new multimedia research fields begin to discover its potential. Both image and point cloud coding are good examples of technologies, where deep learning-based solutions have recently displayed very competitive performance. In this context, this article brings two novel contributions to the point cloud geometry coding state-of-the-art; first, a novel neighborhood adaptive distortion metric to be used in the training loss function, which allows significantly improving the rate-distortion performance with commonly used objective quality metrics; second, an explicit quantization approach at the training and coding times to generate varying rate/quality with a single trained deep learning coding model, effectively reducing the training complexity and storage requirements. The result is an improved deep learning-based point cloud geometry coding solution, which is both more compression efficient and less demanding in training complexity and storage.
- Research Article
11
- 10.1016/j.gmod.2023.101173
- Mar 31, 2023
- Graphical Models
High-fidelity point cloud completion with low-resolution recovery and noise-aware upsampling
- Conference Article
33
- 10.1145/3474085.3475381
- Oct 17, 2021
Point clouds obtained from 3D sensors are usually sparse. Existing methods mainly focus on upsampling sparse point clouds in a supervised manner by using dense ground truth point clouds. In this paper, we propose a self-supervised point cloud upsampling network (SSPU-Net) to generate dense point clouds without using ground truth. To achieve this, we exploit the consistency between the input sparse point cloud and generated dense point cloud for the shapes and rendered images. Specifically, we first propose a neighbor expansion unit (NEU) to upsample the sparse point clouds, where the local geometric structures of the sparse point clouds are exploited to learn weights for point interpolation. Then, we develop a differentiable point cloud rendering unit (DRU) as an end-to-end module in our network to render the point cloud into multi-view images. Finally, we formulate a shape-consistent loss and an image-consistent loss to train the network so that the shapes of the sparse and dense point clouds are as consistent as possible. Extensive results on the CAD and scanned datasets demonstrate that our method can achieve impressive results in a self-supervised manner.
- Research Article
17
- 10.3390/app9050951
- Mar 6, 2019
- Applied Sciences
3D point cloud classification has wide applications in the field of scene understanding. Point cloud classification based on points can more accurately segment the boundary region between adjacent objects. In this paper, a point cloud classification algorithm based on a single point multilevel features fusion and pyramid neighborhood optimization are proposed for a Airborne Laser Scanning (ALS) point cloud. First, the proposed algorithm determines the neighborhood region of each point, after which the features of each single point are extracted. For the characteristics of the ALS point cloud, two new feature descriptors are proposed, i.e., a normal angle distribution histogram and latitude sampling histogram. Following this, multilevel features of a single point are constructed by multi-resolution of the point cloud and multi-neighborhood spaces. Next, the features are trained by the Support Vector Machine based on a Gaussian kernel function, and the points are classified by the trained model. Finally, a classification results optimization method based on a multi-scale pyramid neighborhood constructed by a multi-resolution point cloud is used. In the experiment, the algorithm is tested by a public dataset. The experimental results show that the proposed algorithm can effectively classify large-scale ALS point clouds. Compared with the existing algorithms, the proposed algorithm has a better classification performance.