Spiking neural networks (SNNs), known for their low-power, event-driven computation, and intrinsic temporal dynamics, are emerging as promising solutions for processing dynamic, asynchronous signals from event-based sensors. Despite their potential, SNNs face challenges in training and architectural design, resulting in limited performance in challenging event-based dense prediction tasks compared with artificial neural networks (ANNs). In this work, we develop an efficient spiking encoder-decoder network (SpikingEDN) for large-scale event-based semantic segmentation (EbSS) tasks. To enhance the learning efficiency from dynamic event streams, we harness the adaptive threshold which improves network accuracy, sparsity, and robustness in streaming inference. Moreover, we develop a dual-path spiking spatially adaptive modulation (SSAM) module, which is specifically tailored to enhance the representation of sparse events and multimodal inputs, thereby considerably improving network performance. Our SpikingEDN attains a mean intersection over union (MIoU) of 72.57% on the DDD17 dataset and 58.32% on the larger DSEC-Semantic dataset, showing competitive results to the state-of-the-art ANNs while requiring substantially fewer computational resources. Our results shed light on the untapped potential of SNNs in event-based vision applications. The source codes are publicly available at https://github.com/EMI-Group/spikingedn.
Read full abstract