MonoFG: Monocular 3D Object Detection with Knowledge Distillation for Human-Centric Autonomous Driving Systems

Honghao Gao,Xinxin Yu,Yueshen Xu,Qionghuizi Ran,Walayat Hussain

doi:10.1145/3703458

Abstract

Monocular 3D object detection is essential for identifying objects in road images, thus offering valuable environmental perception data that are crucial for human-centric autonomous driving systems. However, due to the inherent limitations of camera imaging, obtaining precise depth information from images alone is challenging, which hampers the accuracy of within-scene object localization. In this paper, we introduce a monocular 3D object detection method called MonoFG that uses knowledge distillation with separated foreground and background components to improve the accuracy of object localization. First, detached foreground and background distillation processes can strategically leverage the distinct positional information acquired from each location to optimize the produced global distillation effects. This step serves as the foundation for the subsequent feature and response distillation process, which focuses on the distilled foreground and background rather than isolated object distillation. Second, triple attention mechanism-based feature distillation intensifies the feature imitation and feature representation capabilities of the student network. Spatial and channel attention mechanisms encourage the student network to capture crucial pixels and channels from the teacher network, whereas a self-attention mechanism globally transfers the learned relationships between pixels. Third, localization error-based response distillation facilitates a clearer transfer of positional information from the teacher network to the student network. Only when the positioning ability of the teacher network exceeds that of the student network can knowledge be comprehensively distilled across both the foreground and background. Therefore, the distillation process is constrained to specific content, which is delineated by positioning errors that serve as the boundaries. Finally, experiments conducted on the KITTI benchmark dataset demonstrate that our method outperforms many well-known baseline methods in several representative evaluation tasks (e.g., 3D object detection and bird's-eye view (BEV) detection) involving human-centric autonomous driving systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MonoFG: Monocular 3D Object Detection with Knowledge Distillation for Human-Centric Autonomous Driving Systems

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Autonomous and Adaptive Systems

Lead the way for us

Similar Papers

Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud
Xinshuo Weng ... Kris Kitani
-
Xinshuo Weng, et. al.Xinshuo Weng ... Kris Kitani
01 Oct 2019
01 Oct 2019

Kinematic 3D Object Detection in Monocular Video
Garrick Brazil ... Bernt Schiele
-
Garrick Brazil, et. al.Garrick Brazil ... Bernt Schiele
01 Jan 2020
01 Jan 2020

Aerial Monocular 3D Object Detection
Yue Hu ... Weidi Xie
IEEE Robotics and Automation Letters | VOL. 8
Yue Hu, et. al.Yue Hu ... Weidi Xie
01 Apr 2023
IEEE Robotics and Automation Letters | VOL. 8

Relationship between Bird's-Eye View Cognition and Visual Search Behavior
Keisuke Kanda ... Hiromi Ishiwatari
-
Keisuke Kanda, et. al.Keisuke Kanda ... Hiromi Ishiwatari
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MonoFG: Monocular 3D Object Detection with Knowledge Distillation for Human-Centric Autonomous Driving Systems

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Autonomous and Adaptive Systems