Abstract
Robust object detection in challenging scenarios remains a critical challenge for autonomous driving systems. Inspired by human visual perception, integrating the complementary modalities of RGB frames and event streams presents a promising approach to achieving robust object detection. However, existing multimodal object detectors achieve superior performance at the cost of significant computational power consumption. To address this challenge, we propose a novel spiking RGB–event fusion-based detection network (SFDNet), a fully spiking object detector capable of achieving both low-power and high-performance object detection. Specifically, we first introduce the Leaky Integrate-and-Multi-Fire (LIMF) neuron model, which combines soft and hard reset mechanisms to enhance feature representation in SNNs. We then develop a multi-scale hierarchical spiking residual attention network and a lightweight spiking aggregation module for efficient dual-modality feature extraction and fusion. Experimental results on two public multimodal object detection datasets demonstrate that our SFDNet achieves state-of-the-art performance with remarkably low power consumption. The superior performance in challenging scenarios, such as motion blur and low-light conditions, highlights the robustness and effectiveness of SFDNet, significantly advancing the applicability of SNNs for real-world object detection tasks.
Published Version
Join us for a 30 min session where you can share your feedback and ask us any queries you have