Abstract

Target detection in autonomous driving is a pivotal domain of the current research focus. The process fundamentally relies on on-board sensors tasked to identify proximate objects. However, long-range perception instability and the partial obstruction by traffic participants further complicate the challenge. These factors collectively affected the effectiveness of targeted detection from the cooperative vehicle-infrastructure system (CVIS) and urgently need to be addressed. Starting from infrastructure-side assisted detection, we propose a 3D detection network based on multi-sensor sensing. Our approach consists of three sub-networks: Diversity Balanced Feature Fusion Network (MRBNeXt), Early Multimodal Fusion Network (VBRFusion), and TwoStage Lightweight Detection Network (TSL). MRBNeXt focuses on extracting raw images fused into multilevel semantic representations to address the drawbacks of needing a rich feature-level representation; VBRFusion proposes a two-branch structure that acts on the point cloud voxelization to aggregate high-dimensional features. The point features are mapped to the sampled graphical semantic features via the coordinate features to complete the early fusion and thus improve the feature quality of a single modality. In the proposed area network, TSL enhances the sensory field processing of multidimensional features using the context-aware module in a two-stage approach to achieve fast target recognition at different scale levels. Finally, we perform comparative ablation experiments on the DAIR-V2X vehicle-infrastructure dataset. The results validated our approach and demonstrated its effectiveness and enhancement in detection accuracy at the infrastructure end compared to current state-of-the-art methods. This improvement significantly boosted the performance of 3D target detection tasks in complex traffic scenarios and provided a more robust justification for the subsequent development of vehicle-side-infrastructure-side collaborative 3D target detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call