Semantic segmentation tasks encompass various applications, such as autonomous driving, medical imaging, and robotics. Achieving accurate semantic information retrieval under conditions of high dynamic range and rapid scene changes remains a significant challenge for image-based algorithms. This challenge is primarily attributable to the limitations of conventional image sensors, which can experience motion blur or exposure artifacts. In contrast, event-based vision sensors, which asynchronously report changes in pixel intensity, offer a compelling solution by acquiring visual information at the same rate as the scene dynamics, thereby mitigating these limitations. However, we encounter a significant challenge in event-based semantic segmentation tasks: the need to expend time on converting event data into frame images to align with existing image-based semantic segmentation techniques. This approach squanders the inherently high temporal resolution of event data, compromising the accuracy and real-time performance of semantic segmentation tasks. To address these issues, this work explores a sparse semantic segmentation approach that directly addresses event data. We propose a network named EventSegNet that improves the ability to extract geometric features from event data by combining geometric feature enhancement operations and attention mechanisms. Based on this, we propose a large-scale event-based semantic segmentation dataset that provides labels for each event. Our approach achieved a new F1 score of 84.2% on the dataset. In addition, a lightweight and edge-oriented AI inference deployment technique was implemented for the network model. Compared to the baseline model, the optimized network model reduces the F1 score by 1.1% but is more than twice as fast computationally, enabling real-time inference on the NVIDIA AGX Xavier.
Read full abstract