Abstract
The complementary information from RGB and thermal images can remarkably boost semantic segmentation performance. Existing RGB-T segmentation methods usually use simple interaction strategies to extract complementary information from RGB and thermal images, which ignores recognizability features from different imaging mechanisms. To address these problems, we propose a multistage information interaction network for RGB-T semantic segmentation called MS-IRTNet. MS-IRTNet has a dual-stream encoder structure that can extract multistage feature information. To better interact with multimodal information, we design a gate-weighted interaction module (GWIM) and a feature information interaction module (FIIM). GWIM can learn multimodal information weights in different channels, while FIIM integrates and fuses weighted RGB and thermal information into a single feature map. Finally, multistage interactive information is fed into the decoder for semantic prediction. Our method achieves 60.5 mIoU on the MFNet dataset, outperforming state-of-the-art methods. Notably, MS-IRTNet also achieved state-of-the-art results in tests of daytime images (51.7 mIoU) and nighttime images (62.5 mIoU). The code and pre-trained models are available at https://github.com/poisonzzw/MS-IRTNet.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have