3D Object Detection Research Articles

Monocular 3D object detection is essential for identifying objects in road images, thus offering valuable environmental perception data that are crucial for human-centric autonomous driving systems. However, due to the inherent limitations of camera imaging, obtaining precise depth information from images alone is challenging, which hampers the accuracy of within-scene object localization. In this paper, we introduce a monocular 3D object detection method called MonoFG that uses knowledge distillation with separated foreground and background components to improve the accuracy of object localization. First, detached foreground and background distillation processes can strategically leverage the distinct positional information acquired from each location to optimize the produced global distillation effects. This step serves as the foundation for the subsequent feature and response distillation process, which focuses on the distilled foreground and background rather than isolated object distillation. Second, triple attention mechanism-based feature distillation intensifies the feature imitation and feature representation capabilities of the student network. Spatial and channel attention mechanisms encourage the student network to capture crucial pixels and channels from the teacher network, whereas a self-attention mechanism globally transfers the learned relationships between pixels. Third, localization error-based response distillation facilitates a clearer transfer of positional information from the teacher network to the student network. Only when the positioning ability of the teacher network exceeds that of the student network can knowledge be comprehensively distilled across both the foreground and background. Therefore, the distillation process is constrained to specific content, which is delineated by positioning errors that serve as the boundaries. Finally, experiments conducted on the KITTI benchmark dataset demonstrate that our method outperforms many well-known baseline methods in several representative evaluation tasks (e.g., 3D object detection and bird's-eye view (BEV) detection) involving human-centric autonomous driving systems.

Read full abstract

Accurate recognition and localization of 3D objects is a fundamental research problem in 3D computer vision. Benefiting from transformation-free point cloud processing and flexible receptive fields, point-based methods have become accurate in 3D point cloud modeling, but still fall behind voxel-based competitors in 3D detection. We observe that the set abstraction module, commonly utilized by point-based methods for downsampling points, tends to retain excessive irrelevant background information, thus hindering the effective learning of features for object detection tasks. To address this issue, we propose MSSA, a Multi-representation Semantics-augmented Set Abstraction for 3D object detection. Specifically, we first design a backbone network to encode different representation features of point clouds, which extracts point-wise features through PointNet to preserve fine-grained geometric structure features, and adopts VoxelNet to extract voxel features and BEV features to enhance the semantic features of key points. Second, to efficiently fuse different representation features of keypoints, we propose a Point feature-guided Voxel feature and BEV feature fusion (PVB-Fusion) module to adaptively fuse multi-representation features and remove noise. At last, a novel Multi-representation Semantic-guided Farthest Point Sampling (MS-FPS) algorithm is designed to help set abstraction modules progressively downsample point clouds, thereby improving instance recall and detection performance with more important foreground points. We evaluate MSSA on the widely used KITTI dataset and the more challenging nuScenes dataset. Experimental results show that compared to PointRCNN, our method improves the AP of “moderate” level for three classes of objects by 7.02%, 6.76%, and 5.44%, respectively. Compared to the advanced point-voxel-based method PV-RCNN, our method improves the AP of “moderate” level by 1.23%, 2.84%, and 0.55% for the three classes, respectively.

Read full abstract

3D Object Detection Research Articles

Related Topics

Articles published on 3D Object Detection

CO-Net++: A Cohesive Network for Multiple Point Cloud Tasks at Once With Two-Stage Feature Rectification.

UADA3D: Unsupervised Adversarial Domain Adaptation for 3D Object Detection With Sparse LiDAR and Large Domain Gaps

Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection.

Contextual Attribution Maps-Guided Transferable Adversarial Attack for 3D Object Detection

A Component-based Systematic Review of Pool-based Active Learning for 2D Object Detection

Research on Traffic Scene Element Recognition for Autonomous Driving Based on Deep Learning

Sec-CLOCs: Multimodal Back-End Fusion-Based Object Detection Algorithm in Snowy Scenes

PLC-Fusion: Perspective-Based Hierarchical and Deep LiDAR Camera Fusion for 3D Object Detection in Autonomous Vehicles

MS3D: A Multi-Scale Feature Fusion 3D Object Detection Method for Autonomous Driving Applications

Advanced Point Cloud Techniques for Improved 3D Object Detection: A Study on DBSCAN, Attention, and Downsampling

Efficient indoor 3D object detection in point clouds using the Kinect sensor

MonoFG: Monocular 3D Object Detection with Knowledge Distillation for Human-Centric Autonomous Driving Systems

Bi-directional information interaction for multi-modal 3D object detection in real-world traffic scenes

Point cloud segmentation method based on an image mask and its application verification

Exploring Diversity-Based Active Learning for 3D Object Detection in Autonomous Driving

DeployFusion: A Deployable Monocular 3D Object Detection with Multi-Sensor Information Fusion in BEV for Edge Devices.

CWGA-Net: Center-Weighted Graph Attention Network for 3D object detection from point clouds

PE-MCAT: Leveraging Image Sensor Fusion and Adaptive Thresholds for Semi-Supervised 3D Object Detection.

MSSA: Multi-Representation Semantics-Augmented Set Abstraction for 3D Object Detection

MonoCAPE: Monocular 3D object detection with coordinate-aware position embeddings

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

3D Object Detection Research Articles

Related Topics

Articles published on 3D Object Detection

CO-Net++: A Cohesive Network for Multiple Point Cloud Tasks at Once With Two-Stage Feature Rectification.

UADA3D: Unsupervised Adversarial Domain Adaptation for 3D Object Detection With Sparse LiDAR and Large Domain Gaps

Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection.

Contextual Attribution Maps-Guided Transferable Adversarial Attack for 3D Object Detection

A Component-based Systematic Review of Pool-based Active Learning for 2D Object Detection

Research on Traffic Scene Element Recognition for Autonomous Driving Based on Deep Learning

Sec-CLOCs: Multimodal Back-End Fusion-Based Object Detection Algorithm in Snowy Scenes

PLC-Fusion: Perspective-Based Hierarchical and Deep LiDAR Camera Fusion for 3D Object Detection in Autonomous Vehicles

MS3D: A Multi-Scale Feature Fusion 3D Object Detection Method for Autonomous Driving Applications

Advanced Point Cloud Techniques for Improved 3D Object Detection: A Study on DBSCAN, Attention, and Downsampling

Efficient indoor 3D object detection in point clouds using the Kinect sensor

MonoFG: Monocular 3D Object Detection with Knowledge Distillation for Human-Centric Autonomous Driving Systems

Bi-directional information interaction for multi-modal 3D object detection in real-world traffic scenes

Point cloud segmentation method based on an image mask and its application verification

Exploring Diversity-Based Active Learning for 3D Object Detection in Autonomous Driving

DeployFusion: A Deployable Monocular 3D Object Detection with Multi-Sensor Information Fusion in BEV for Edge Devices.

CWGA-Net: Center-Weighted Graph Attention Network for 3D object detection from point clouds

PE-MCAT: Leveraging Image Sensor Fusion and Adaptive Thresholds for Semi-Supervised 3D Object Detection.

MSSA: Multi-Representation Semantics-Augmented Set Abstraction for 3D Object Detection

MonoCAPE: Monocular 3D object detection with coordinate-aware position embeddings