Abstract

Semantic segmentation plays an important role in widespread applications such as autonomous driving and robotic sensing. This work focuses on RGB-Thermal (RGB-T) semantic segmentation, since RGB images are easily affected by lighting conditions, e.g., darkness, and thermal images are robust to the night scenario as a compensating modality. However, existing works either simply fuse RGB-T images or adopt the encoder with the same backbone for the two modalities, neglecting the semantic difference under varying lighting conditions. Therefore, we present a Residual Spatial Fusion Network (RSFNet) for RGB-T semantic segmentation. Specifically, RSFNet employs an asymmetric encoder to learn the compensating features of RGB and thermal images. To fuse the dual-modality features, it generates pseudo-labels by adopting the saliency detection skill to supervise feature learning, and develops the Residual Spatial Fusion (RSF) module with structural re-parameterization to learn more promising features by spatially fusing the cross-modality features. RSF adopts a hierarchical feature fusion to aggregate multi-level features, and applies the spatial weights with the residual connection to adaptively control the multi-spectral feature fusion by the confidence gate. Extensive experiments were carried out on two benchmarks, i.e., MFNet database and PST900 database. The results have shown the state-of-the-art segmentation performance of our method, which strikes a good balance between accuracy and speed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call