Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Sparse 3D Point Cloud Parallel Multi-Scale Feature Extraction and Dense Reconstruction with Multi-Headed Attentional Upsampling

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Three-dimensional (3D) point clouds have a wide range of applications in the field of 3D vision. The quality of the acquired point cloud data considerably impacts the subsequent work of point cloud processing. Due to the sparsity and irregularity of point cloud data, processing point cloud data has always been challenging. However, existing deep learning-based point cloud dense reconstruction methods suffer from excessive smoothing of reconstruction results and too many outliers. The reason for this is that it is not possible to extract features for local and global features at different scales and provide different levels of attention to different regions in order to obtain long-distance dependence for dense reconstruction. In this paper, we use a parallel multi-scale feature extraction module based on graph convolution and an upsampling method with an added multi-head attention mechanism to process sparse and irregular point cloud data to obtain extended point clouds. Specifically, a point cloud training patch with 256 points is inputted. The PMS module uses three residual connections in the multi-scale feature extraction stage. Each PMS module consists of three parallel DenseGCN modules with different size convolution kernels and different averaging pooling sizes. The local and global feature information of the augmented receptive field is extracted efficiently. The scale information is obtained by averaging the different pooled augmented receptive fields. The scale information was obtained using the different average pooled augmented receptive fields. The upsampling stage uses an upsampling rate of r=4, The self-attentive features with a different focus on different point cloud data regions obtained by fusing different weights make the feature representation more diverse. This operation avoids the bias of one attention, and each focuses on extracting valuable fine-grained feature information. Finally, the coordinate reconstruction module obtains 1024 dense point cloud data. Experiments show that the proposed method demonstrates good evaluation metrics and performance and is able to obtain better visual quality. The problems of over-smoothing and excessive outliers are effectively mitigated, and the obtained sparse point cloud is more dense.

Similar Papers
  • Research Article
  • Cite Count Icon 11
  • 10.1016/j.gmod.2023.101173
High-fidelity point cloud completion with low-resolution recovery and noise-aware upsampling
  • Mar 31, 2023
  • Graphical Models
  • Ren-Wu Li + 4 more

High-fidelity point cloud completion with low-resolution recovery and noise-aware upsampling

  • Conference Article
  • Cite Count Icon 33
  • 10.1145/3474085.3475381
SSPU-Net: Self-Supervised Point Cloud Upsampling via Differentiable Rendering
  • Oct 17, 2021
  • Yifan Zhao + 2 more

Point clouds obtained from 3D sensors are usually sparse. Existing methods mainly focus on upsampling sparse point clouds in a supervised manner by using dense ground truth point clouds. In this paper, we propose a self-supervised point cloud upsampling network (SSPU-Net) to generate dense point clouds without using ground truth. To achieve this, we exploit the consistency between the input sparse point cloud and generated dense point cloud for the shapes and rendered images. Specifically, we first propose a neighbor expansion unit (NEU) to upsample the sparse point clouds, where the local geometric structures of the sparse point clouds are exploited to learn weights for point interpolation. Then, we develop a differentiable point cloud rendering unit (DRU) as an end-to-end module in our network to render the point cloud into multi-view images. Finally, we formulate a shape-consistent loss and an image-consistent loss to train the network so that the shapes of the sparse and dense point clouds are as consistent as possible. Extensive results on the CAD and scanned datasets demonstrate that our method can achieve impressive results in a self-supervised manner.

  • Conference Article
  • Cite Count Icon 60
  • 10.1109/iros.2016.7759604
Fast and robust 3D feature extraction from sparse point clouds
  • Oct 1, 2016
  • Jacopo Serafin + 2 more

Matching 3D point clouds, a critical operation in map building and localization, is difficult with Velodyne-type sensors due to the sparse and non-uniform point clouds that they produce. Standard methods from dense 3D point clouds are generally not effective. In this paper, we describe a feature-based approach using Principal Components Analysis (PCA) of neighborhoods of points, which results in mathematically principled line and plane features. The key contribution in this work is to show how this type of feature extraction can be done efficiently and robustly even on non-uniformly sampled point clouds. The resulting detector runs in real-time and can be easily tuned to have a low false positive rate, simplifying data association. We evaluate the performance of our algorithm on an autonomous car at the MCity Test Facility using a Velodyne HDL-32E, and we compare our results against the state-of-the-art NARF keypoint detector.

  • Conference Article
  • Cite Count Icon 26
  • 10.1109/iros.2013.6696479
Real-time SLAM with piecewise-planar surface models and sparse 3D point clouds
  • Nov 1, 2013
  • Paul Ozog + 1 more

This paper reports on the use of planar patches as features in a real-time simultaneous localization and mapping (SLAM) system to model smooth surfaces as piecewise-planar. This approach works well for using observed point clouds to correct odometry error, even when the point cloud is sparse. Such sparse point clouds are easily derived by Doppler velocity log sensors for underwater navigation. Each planar patch contained in this point cloud can be constrained in a factor-graph-based approach to SLAM so that neighboring patches are sufficiently coplanar so as to constrain the robot trajectory, but not so much so that the curvature of the surface is lost in the representation. To validate our approach, we simulated a virtual 6-degree of freedom robot performing a spiral-like survey of a sphere, and provide real-world experimental results for an autonomous underwater vehicle used for automated ship hull inspection. We demonstrate that using the sparse 3D point cloud greatly improves the self-consistency of the map. Furthermore, the use of our piecewise-planar framework provides an additional constraint to multi-session underwater SLAM, improving performance over monocular camera measurements alone.

  • Research Article
  • Cite Count Icon 4
  • 10.3390/rs16234513
Real-Time Environmental Contour Construction Using 3D LiDAR and Image Recognition with Object Removal
  • Dec 1, 2024
  • Remote Sensing
  • Tzu-Jung Wu + 2 more

In recent years, due to the significant advancements in hardware sensors and software technologies, 3D environmental point cloud modeling has gradually been applied in the automation industry, autonomous vehicles, and construction engineering. With the high-precision measurements of 3D LiDAR, its point clouds can clearly reflect the geometric structure and features of the environment, thus enabling the creation of high-density 3D environmental point cloud models. However, due to the enormous quantity of high-density 3D point clouds, storing and processing these 3D data requires a considerable amount of memory and computing time. In light of this, this paper proposes a real-time 3D point cloud environmental contour modeling technique. The study uses the point cloud distribution from the 3D LiDAR body frame point cloud to establish structured edge features, thereby creating a 3D environmental contour point cloud map. Additionally, unstable objects such as vehicles will appear during the mapping process; these specific objects will be regarded as not part of the stable environmental model in this study. To address this issue, the study will further remove these objects from the 3D point cloud through image recognition and LiDAR heterogeneous matching, resulting in a higher quality 3D environmental contour point cloud map. This 3D environmental contour point cloud not only retains the recognizability of the environmental structure but also solves the problems of massive data storage and processing. Moreover, the method proposed in this study can achieve real-time realization without requiring the 3D point cloud to be organized in a structured order, making it applicable to unorganized 3D point cloud LiDAR sensors. Finally, the feasibility of the proposed method in practical applications is also verified through actual experimental data.

  • Conference Article
  • Cite Count Icon 1
  • 10.1117/12.2539780
Sparse 3D point clouds segmentation considering 2D image feature extraction with deep learning
  • Aug 14, 2019
  • Li Yusheng + 2 more

Three-dimensional (3D) point cloud segmentation plays an important role in autonomous navigation systems, such as mobile robots and autonomous cars. However, the segmentation is challenging because of data sparsity, uneven sampling density, irregular format, and lack of color texture. In this paper, we propose a sparse 3D point cloud segmentation method based on 2D image feature extraction with deep learning. Firstly, we jointly calibrate the camera and lidar to get the external parameters (rotation matrix and translation vector). Then, we introduce the Convolutional Neural Network (CNN)-based object detectors to generate 2D object region proposals in the RGB image and classify object. Finally, based on the external parameters of joint calibration, we extract point clouds that can be projected to 2D object region from 16-lines RS-LIDAR-16 scanner, and further fine segmentation in the extracted point cloud according to prior knowledge of the classification features. Experiments demonstrate the effectiveness of the proposed sparse point cloud segmentation method.

  • Conference Article
  • Cite Count Icon 277
  • 10.1145/3349624.3356768
RadHAR
  • Oct 7, 2019
  • Akash Deep Singh + 3 more

Accurate human activity recognition (HAR) is the key to enable emerging context-aware applications that require an understanding and identification of human behavior, e.g., monitoring disabled or elderly people who live alone. Traditionally, HAR has been implemented either through ambient sensors, e.g., cameras, or through wearable devices, e.g., a smartwatch, with an inertial measurement unit (IMU). The ambient sensing approach is typically more generalizable for different environments as this does not require every user to have a wearable device. However, utilizing a camera in privacy-sensitive areas such as a home may capture superfluous ambient information that a user may not feel comfortable sharing. Radars have been proposed as an alternative modality for coarse-grained activity recognition that captures a minimal subset of the ambient information using micro-Doppler spectrograms. However, training fine-grained, accurate activity classifiers is a challenge as low-cost millimeter-wave (mmWave) radar systems produce sparse and non-uniform point clouds. In this paper, we propose RadHAR, a framework that performs accurate HAR using sparse and non-uniform point clouds. RadHAR utilizes a sliding time window to accumulate point clouds from a mmWave radar and generate a voxelized representation that acts as input to our classifiers. We evaluate RadHAR using a low-cost, commercial, off-the-shelf radar to get sparse point clouds which are less visually compromising. We evaluate and demonstrate our system on a collected human activity dataset with 5 different activities. We compare the accuracy of various classifiers on the dataset and find that the best performing deep learning classifier achieves an accuracy of 90.47%. Our evaluation shows the efficacy of using mmWave radar for accurate HAR detection and we enumerate future research directions in this space.

  • Dissertation
  • 10.32657/10356/173430
Motion estimation and prediction from 3D point clouds
  • Jan 1, 2024
  • Ruibo Li

Understanding the motion of dynamic environments holds significant benefits for various applications, including robotics and autonomous driving. Scene flow estimation from 3D point clouds, which outputs a per-point 3D motion field between two consecutive time steps, has garnered increasing attention. Although deep learning-based scene flow models have shown promising results, how to capture motion from sparse and irregular point cloud data remains an open question. Furthermore, supervised training of these models demands substantial training data with scene flow annotations, which is both scarce and expensive to collect. To reduce the reliance on scene flow annotations, self-supervised scene flow estimation has emerged as a viable solution, where no annotations are required during training. Apart from scene flow estimation from known point clouds, motion prediction, which generates the future position of point clouds based on past observations, is another active research topic and plays a vital role in path planning and navigation. However, supervised motion prediction methods still rely on abundant motion annotations, while the performance of current self-supervised methods is far from satisfactory. Therefore, motion prediction in a weakly supervised manner is a promising avenue to strike a balance between the effort required for annotations and the performance of models. In this thesis, we study motion estimation and prediction in three different learning paradigms, including fully supervised and self-supervised scene flow estimation, and weakly supervised motion prediction. In fully supervised scene flow estimation, earlier methods treat this task as a per-point regression problem while overlooking the potential rigid motion in local regions. To tackle this limitation, in Chapter 3, we design a new scene flow estimation framework, HCRF-Flow, that effectively integrates the capabilities of Deep Neural Networks and Conditional Random Fields. Specifically, HCRF-Flow contains two components. Firstly, it incorporates a DNN-based flow estimation module that performs per-point motion regression. And then, HCRF-Flow employs a new continuous high-order CRFs module to refine the per-point motion predictions by enforcing point-wise smoothness and region-wise rigidity. By leveraging the two components in unison, HCRF-Flow demonstrates superior performance compared to previous methods. In self-supervised scene flow estimation, when scene flow annotations are unavailable, building correspondences between two consecutive point clouds to approximate its scene flow has been shown to be a feasible approach. However, previous methods commonly rely on point-wise matching that solely considers the distance on 3D point coordinates to obtain correspondences. This approach yields two issues: (1) it ignores other discriminative clues; and (2) the matching process is unconstrained, which may lead to a many-to-one problem. To tackle the issues, in Chapter 4, we generate pseudo scene flow by an optimal transport module, which incorporates 3D coordinates, colors, and surface normal as measures and explicitly enforces one-to-one matching. In addition, we design a refinement module to improve the pseudo scene flow labels further by enforcing point-wise smoothness via a random walk algorithm. Although this method demonstrates promising performance, the employed point matching tends to ignore the potential structured motion within local regions, consequently generating inaccurate pseudo labels. Inspired by the local rigidity assumption, in Chapter 5, we propose to generate pseudo labels by piecewise rigid motion estimation. Specifically, by splitting the first point cloud into local regions, we design a piecewise pseudo label generation module that explicitly encourages region-wise rigid alignments between two point clouds, which in turn generates rigid pseudo labels for each region. Experimental results show that our method attains state-of-the-art performance in self-supervised scene flow learning. For weakly supervised motion prediction, in Chapter 6, we design a new weakly supervised learning paradigm, where fully or partially annotated foreground/background (FG/BG) masks are utilized instead of expensive motion data for supervision. To this end, we design a two-stage weakly supervised motion prediction framework. In Stage 1, we train an FG/BG segmentation network using partially annotated masks. Then in Stage 2, we train a motion prediction network in self-supervised manner. Specifically, during Stage 2, the segmentation network from Stage 1 generates foreground points for the training data, enabling the motion prediction network to undergo self-supervised training on these foreground points. Experiments demonstrate that our weakly supervised models, utilizing FG/BG masks as weak supervision, outperform self-supervised models and achieve comparable performance to some supervised models. To the best of our knowledge, we are the first to study motion prediction in a weakly supervised manner.

  • Conference Article
  • Cite Count Icon 66
  • 10.1109/cvpr52688.2022.00621
Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors
  • Jun 1, 2022
  • Baorui Ma + 2 more

It is an important task to reconstruct surfaces from 3D point clouds. Current methods are able to reconstruct surfaces by learning Signed Distance Functions (SDFs) from single point clouds without ground truth signed distances or point normals. However, they require the point clouds to be dense, which dramatically limits their performance in real applications. To resolve this issue, we propose to reconstruct highly accurate surfaces from sparse point clouds with an on-surface prior. We train a neural network to learn SDFs via projecting queries onto the surface represented by the sparse point cloud. Our key idea is to infer signed distances by pushing both the query projections to be on the surface and the projection distance to be the minimum. To achieve this, we train a neural network to capture the on-surface prior to determine whether a point is on a sparse point cloud or not, and then leverage it as a differentiable function to learn SDFs from unseen sparse point cloud. Our method can learn SDFs from a single s parse point cloud without ground truth signed distances or point normals. Our numerical evaluation under widely used benchmarks demonstrates that our method achieves state-of-the-art reconstruction accuracy, especially for sparse point clouds. Code and data are available at https://github.com/mabaorui/OnSurfacePrior.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.3390/s24072358
Enhancing Building Point Cloud Reconstruction from RGB UAV Data with Machine-Learning-Based Image Translation.
  • Apr 8, 2024
  • Sensors
  • Elisabeth Johanna Dippold + 1 more

The performance of three-dimensional (3D) point cloud reconstruction is affected by dynamic features such as vegetation. Vegetation can be detected by near-infrared (NIR)-based indices; however, the sensors providing multispectral data are resource intensive. To address this issue, this study proposes a two-stage framework to firstly improve the performance of the 3D point cloud generation of buildings with a two-view SfM algorithm, and secondly, reduce noise caused by vegetation. The proposed framework can also overcome the lack of near-infrared data when identifying vegetation areas for reducing interferences in the SfM process. The first stage includes cross-sensor training, model selection and the evaluation of image-to-image RGB to color infrared (CIR) translation with Generative Adversarial Networks (GANs). The second stage includes feature detection with multiple feature detector operators, feature removal with respect to the NDVI-based vegetation classification, masking, matching, pose estimation and triangulation to generate sparse 3D point clouds. The materials utilized in both stages are a publicly available RGB-NIR dataset, and satellite and UAV imagery. The experimental results indicate that the cross-sensor and category-wise validation achieves an accuracy of 0.9466 and 0.9024, with a kappa coefficient of 0.8932 and 0.9110, respectively. The histogram-based evaluation demonstrates that the predicted NIR band is consistent with the original NIR data of the satellite test dataset. Finally, the test on the UAV RGB and artificially generated NIR with a segmentation-driven two-view SfM proves that the proposed framework can effectively translate RGB to CIR for NDVI calculation. Further, the artificially generated NDVI is able to segment and classify vegetation. As a result, the generated point cloud is less noisy, and the 3D model is enhanced.

  • Research Article
  • Cite Count Icon 18
  • 10.1016/j.patcog.2023.109796
APUNet: Attention-guided upsampling network for sparse and non-uniform point cloud
  • Jun 30, 2023
  • Pattern Recognition
  • Tianming Zhao + 4 more

APUNet: Attention-guided upsampling network for sparse and non-uniform point cloud

  • Research Article
  • Cite Count Icon 2
  • 10.1088/1742-6596/2216/1/012028
Multi-sensor fusion of sparse point clouds based on neuralnet works
  • Mar 1, 2022
  • Journal of Physics: Conference Series
  • Qiming Yang + 5 more

The fusion of laser point cloud and visual image depends on the point cloud density and the target framing effect, the traditional laser point cloud processing for sparse point cloud clustering effect is poor, it is difficult to frame small objects as well as medium and long distance objects. Then the subsequent sensor fusion is easy to miss the recognition of obstacles. In this paper, we improve the frame selection method for sparse point clouds, firstly build a deep learning framework pointpillar, use pointpillar to frame the sparse laser point clouds, then spatially calibrate the lidar coordinate system and camera coordinate system, project the lidar point clouds to the camera image, improve the late fusion method, effectively use the detection results of single sensor, and finally The late-fusion is performed with the target detection results of the camera image to output the exact distance as well as the category of the target. Experiments show that compared with the recognition effect of the traditional fusion algorithm, the number of frames is increased by 6 and the missed recognition rate is reduced from 31.41% to 12.31%.

  • Research Article
  • Cite Count Icon 57
  • 10.1109/lra.2021.3068712
Volumetric Propagation Network: Stereo-LiDAR Fusion for Long-Range Depth Estimation
  • Mar 25, 2021
  • IEEE Robotics and Automation Letters
  • Jaesung Choe + 3 more

Stereo-LiDAR fusion is a promising task in that we can utilize two different types of 3D perceptions for practical usage - dense 3D information (stereo cameras) and highly-accurate sparse point clouds (LiDAR). However, due to their different modalities and structures, the method of aligning sensor data is the key for successful sensor fusion. To this end, we propose a geometry-aware stereo-LiDAR fusion network for long-range depth estimation, called volumetric propagation network. The key idea of our network is to exploit sparse and accurate point clouds as a cue for guiding correspondences of stereo images in a unified 3D volume space. Unlike existing fusion strategies, we directly embed point clouds into the volume, which enables us to propagate valid information into nearby voxels in the volume, and to reduce the uncertainty of correspondences. Thus, it allows us to fuse two different input modalities seamlessly and regress a long-range depth map. Our fusion is further enhanced by a newly proposed feature extraction layer for point clouds guided by images: FusionConv. FusionConv extracts point cloud features that consider both semantic (2D image domain) and geometric (3D domain) relations and aid fusion at the volume. Our network achieves state-of-the-art performance on KITTI and Virtual-KITTI datasets among recent stereo-LiDAR fusion methods.

  • Research Article
  • Cite Count Icon 10
  • 10.1109/lra.2022.3221313
Multi-Level Structure-Enhanced Network for 3D Single Object Tracking in Sparse Point Clouds
  • Jan 1, 2023
  • IEEE Robotics and Automation Letters
  • Qiaoyun Wu + 2 more

3D single object tracking is the task of localizing a target object in a search point cloud frame. In this letter, we present a multi-level structure-enhanced tracking model to improve the tracking performance in sparse 3D point clouds. Towards this end, we first encode the target and the search point clouds efficiently in two near-neighbors graphs, which allows structural information to flow between neighbors along graph edges. We then design a cross-graph attention mechanism, which associates similar nodes across the target graph and the search graph, and further dissimilar graph nodes for apart. Integrating the proposed mechanism into the above Siamese feature learning of both the target and the search frame, we strengthen the structural correlation between the target and the search frame. In that case, distinguishing the potential target from the background in the search frame would be much simpler. Finally, we design a U-shaped sparse convolutional block to aggregate the structural features of the potential target in the search frame. Integrating the proposed block into an existing target localization module from (Hui et al., 2021), we localize target centers accurately. Experiments on the KITTI benchmark demonstrate that our method outperforms some state-of-the-art models, achieving at least a 3.2% improvement in terms of average tracking precision.

  • Research Article
  • Cite Count Icon 68
  • 10.1109/tip.2022.3180904
PU-Dense: Sparse Tensor-Based Point Cloud Geometry Upsampling.
  • Jan 1, 2022
  • IEEE Transactions on Image Processing
  • Anique Akhtar + 4 more

Due to the increased popularity of augmented and virtual reality experiences, the interest in capturing high-resolution real-world point clouds has never been higher. Loss of details and irregularities in point cloud geometry can occur during the capturing, processing, and compression pipeline. It is essential to address these challenges by being able to upsample a low Level-of-Detail (LoD) point cloud into a high LoD point cloud. Current upsampling methods suffer from several weaknesses in handling point cloud upsampling, especially in dense real-world photo-realistic point clouds. In this paper, we present a novel geometry upsampling technique, PU-Dense, which can process a diverse set of point clouds including synthetic mesh-based point clouds, real-world high-resolution point clouds, real-world indoor LiDAR scanned objects, as well as outdoor dynamically acquired LiDAR-based point clouds. PU-Dense employs a 3D multiscale architecture using sparse convolutional networks that hierarchically reconstruct an upsampled point cloud geometry via progressive rescaling and multiscale feature extraction. The framework employs a UNet type architecture that downscales the point cloud to a bottleneck and then upscales it to a higher level-of-detail (LoD) point cloud. PU-Dense introduces a novel Feature Extraction Unit that incorporates multiscale spatial learning by employing filters at multiple sampling rates and receptive fields. The architecture is memory efficient and is driven by a binary voxel occupancy classification loss that allows it to process high-resolution dense point clouds with millions of points during inference time. Qualitative and quantitative experimental results show that our method significantly outperforms the state-of-the-art approaches by a large margin while having much lower inference time complexity. We further test our dataset on high-resolution photo-realistic datasets. In addition, our method can handle noisy data well. We further show that our approach is memory efficient compared to the state-of-the-art methods.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant