NLFNet: Non-Local Fusion Towards Generalized Multimodal Semantic Segmentation across RGB-Depth, Polarization, and Thermal Images

Ran Yan,Kailun Yang,Kaiwei Wang

doi:10.1109/robio54168.2021.9739390

Abstract

In recent years, intelligent driving navigation has made considerable progress, and semantic segmentation is one of the most advanced scene perception methods. At present, traditional semantic segmentation methods can use RGB images for detection of obstacles that are clearly visible in outdoor scenes. However, in the face of complex realistic driving scenes, RGB images cannot provide sufficient information. We need some other modal information to supplement the RGB information. In this paper, we propose Non-Local Fusion Network (NLFNet), which is a semantic segmentation network that can selectively fuse multimodal input information in an adaptive manner. It can use complementary information collected by different optical sensors to extract effective features for fusion. Thereby, it improves the segmentation accuracy of the network and solves the problem of object recognition in various challenging real-world scenes. We conduct comprehensive experiments to verify the effectiveness and generalization ability of the framework across RGB-Depth, RGB-Polarization, and RGB-Thermal image semantic segmentation, which is especially suitable for autonomous driving and robot vision applications.

Full Text