Karlsruhe Institute Of Technology And Toyota Technological Institute Research Articles

Background: Due to the refinement of region of the interests (RoIs), two-stage 3D detection algorithms can usually obtain better performance compared with most single-stage detectors. However, most two-stage methods adopt feature connection, to aggregate the grid point features using multi-scale RoI pooling in the second stage. This connection mode does not consider the correlation between multi-scale grid features. Methods: In the first stage, we employ 3D sparse convolution and 2D convolution to fully extract rich semantic features. Then, a small number of coarse RoIs are predicted based region proposal network (RPN) on generated bird’s eye view (BEV) map. After that, we adopt voxel RoI-pooling strategy to aggregate the neighborhood nonempty voxel features of each grid point in RoI in the last two layers of 3D sparse convolution. In this way, we obtain two aggregated features from 3D sparse voxel space for each grid point. Next, we design an attention feature fusion module. This module includes a local and a global attention layer, which can fully integrate the grid point features from different voxel layers. Results: We carry out relevant experiments on the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) dataset. The average precisions of our proposed method are 88.21%, 81.51%, 77.07% on three difficulty levels (easy, moderate, and hard, respectively) for 3D detection, and 92.30%, 90.19%, 86.00% on three difficulty levels (easy, moderate, and hard, respectively) for BEV detection. Conclusions: In this paper, we propose a novel two-stage 3D detection algorithm named Grid Attention Fusion Region-based Convolutional Neural Network (GAF-RCNN) from point cloud. Because we integrate multi-scale RoI grid features with attention mechanism in the refinement stage, different multi-scale features can be better correlated, achieving a competitive level compared with other well tested detection algorithms. This 3D object detection has important implications for robot and cobot technology.

Read full abstract

Background: 3D object detection based on point clouds in road scenes has attracted much attention recently. The voxel-based methods voxelize the scene to regular grids, which can be processed with the advanced feature learning frameworks based on convolutional layers for semantic feature learning. The point-based methods can extract the geometric feature of the point due to the coordinate reservations. The combination of the two is effective for 3D object detection. However, the current methods use a voxel-based detection head with anchors for classification and localization. Although the preset anchors cover the entire scene, it is not suitable for detection tasks with larger scenes and multiple categories of objects, due to the limitation of the voxel size. Additionally, the misalignment between the predicted confidence and proposals in the Regions of the Interest (ROI) selection bring obstacles to 3D object detection. Methods: We investigate the combination of voxel-based methods and point-based methods for 3D object detection. Additionally, a voxel-to-point module that captures semantic and geometric features is proposed in the paper. The voxel-to-point module is conducive to the detection of small-size objects and avoids the presets of anchors in the inference stage. Moreover, a confidence adjustment module with the center-boundary-aware confidence attention is proposed to solve the misalignment between the predicted confidence and proposals in the regions of the interest selection. Results: The proposed method has achieved state-of-the-art results for 3D object detection in the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) object detection dataset. Actually, as of September 19, 2021, our method ranked 1st in the 3D and Bird Eyes View (BEV) detection of cyclists tagged with difficulty level ‘easy’, and ranked 2nd in the 3D detection of cyclists tagged with ‘moderate’. Conclusions: We propose an end-to-end two-stage 3D object detector with voxel-to-point module and confidence adjustment module.

Read full abstract

Karlsruhe Institute Of Technology And Toyota Technological Institute Research Articles

Articles published on Karlsruhe Institute Of Technology And Toyota Technological Institute

Z-YOLOv8s-based approach for road object recognition in complex traffic scenarios

Monocular Depth Estimation Based on Dilated Convolutions and Feature Fusion

Curvature Scale Space LiDAR Odometry And Mapping (LOAM)

3D VAE Video Prediction Model with Kullback Leibler Loss Enhancement

Distance Transform Pooling Neural Network for LiDAR Depth Completion.

INS/LIDAR/Stereo SLAM Integration for Precision Navigation in GNSS-Denied Environments.

Monocular Distance Estimation-based Approach using Deep Artificial Neural Network

GAF-RCNN: Grid attention fusion 3D object detection from point cloud

MFF-Net: Multimodal Feature Fusion Network for 3D Object Detection

Vehicle Detection in Challenging Scenes Using CenterNet Based Approach

A Deep Learning Framework for Robust and Real-Time Taillight Detection Under Various Road Conditions

Self-Supervised Monocular Depth Estimation Using Hybrid Transformer Encoder

Multiple Object Tracking in Robotic Applications: Trends and Challenges

Anomaly recognition method of perception system for autonomous vehicles based on distance metric

Monocular three-dimensional object detection using data augmentation and self-supervised learning in autonomous driving

3D object detection combining semantic and geometric features from point clouds

Residual 3-D Scene Flow Learning With Context-Aware Feature Extraction

Cooperative Visual Augmentation Algorithm of Intelligent Vehicle Based on Inter-Vehicle Image Fusion

SFGAN: Unsupervised Generative Adversarial Learning of 3D Scene Flow from the 3D Scene Self

Lane Lines Detection under Complex Environment by Fusion of Detection and Prediction Models

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Karlsruhe Institute Of Technology And Toyota Technological Institute Research Articles

Articles published on Karlsruhe Institute Of Technology And Toyota Technological Institute

Z-YOLOv8s-based approach for road object recognition in complex traffic scenarios

Monocular Depth Estimation Based on Dilated Convolutions and Feature Fusion

Curvature Scale Space LiDAR Odometry And Mapping (LOAM)

3D VAE Video Prediction Model with Kullback Leibler Loss Enhancement

Distance Transform Pooling Neural Network for LiDAR Depth Completion.

INS/LIDAR/Stereo SLAM Integration for Precision Navigation in GNSS-Denied Environments.

Monocular Distance Estimation-based Approach using Deep Artificial Neural Network

GAF-RCNN: Grid attention fusion 3D object detection from point cloud

MFF-Net: Multimodal Feature Fusion Network for 3D Object Detection

Vehicle Detection in Challenging Scenes Using CenterNet Based Approach

A Deep Learning Framework for Robust and Real-Time Taillight Detection Under Various Road Conditions

Self-Supervised Monocular Depth Estimation Using Hybrid Transformer Encoder

Multiple Object Tracking in Robotic Applications: Trends and Challenges

Anomaly recognition method of perception system for autonomous vehicles based on distance metric

Monocular three-dimensional object detection using data augmentation and self-supervised learning in autonomous driving

3D object detection combining semantic and geometric features from point clouds

Residual 3-D Scene Flow Learning With Context-Aware Feature Extraction

Cooperative Visual Augmentation Algorithm of Intelligent Vehicle Based on Inter-Vehicle Image Fusion

SFGAN: Unsupervised Generative Adversarial Learning of 3D Scene Flow from the 3D Scene Self

Lane Lines Detection under Complex Environment by Fusion of Detection and Prediction Models