Abstract

RGB-Thermal semantic segmentation is widely used in various perception scenarios and has made significant progress. However, many existing methods tend to overlook the critical challenge of striking a balance between speed and accuracy. In response to this, we introduce ECFNet, an efficient model tailored for real-time RGB-Thermal semantic segmentation, aiming to achieve a balance between speed and accuracy to some extent. Specifically, we enhance feature fusion by incorporating the Asymmetric Cross-layer Self-Attention (ACSA) module, enabling the amalgamation of feature maps across diverse intermediate layers. Additionally, we introduce the Light Effective Spatial Semantic Fusion (LESSF) module to merge feature maps from the final layer. To fully exploit the latent multi-modal feature information, we introduce a Multi-branch Cascade Decoder (MCD) composed of six Hybrid Attention Module (HAM) blocks, it is used to aggregate multi-scale feature maps. We validated our approach on three publicly available benchmark datasets including MFNet, PST900, and FMB. Our method shows effectiveness and achieves a better balance between speed and accuracy. ECFNet achieved 56.2% mIoU on the MFNet dataset and 62.4 FPS on a single NVIDIA GeForce GTX 1080Ti GPU. The code and results are available at https://github.com/WangJoyu/ECFNet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call