Abstract

Salient Object Detection (SOD) has been widely used in practical applications such as multi-sensor image fusion, remote sensing, and defect detection. Recently, SOD from RGB and Thermal (T) has been rapidly developed due to its robustness to extreme situations like low illumination and occlusion. However, existing methods all utilize a dual-stream encoder, which significantly increases the computation burdens and hinders real-world deployment. To this end, we propose a real-time One-stream Semantic-guided Refinement Network (OSRNet) for RGB-T SOD. Specifically, we firstly fuse the RGB and T via concatenation, addition, and multiplication operations to dig the complementary information between each modality. The efficient early fusion not only facilitates the information exchange between each modality but also avoids the cumbersome dual-stream encoder structure. Then, the light-weight decoder is proposed, making the high-level semantic information filter the low-level noisy features and gradually refine the final prediction. Also, we apply deep supervision to make the training procedure more stable and fast. Due to the early fusion strategy, OSRNet can run at a real-time speed (53-60<i>fps</i>) on a single GPU. Extensive quantitative and qualitative experiments show our network outperforms eleven state-of-the-art methods in terms of seven evaluation metrics. Our codes have been released at: https://github.com/huofushuo/OSRNet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call