Efficiently fusing infrared and visible images is critical for their real-time application in various electro-optic systems. In recent years, deep learning-based methods have significantly improved fusion quality by leveraging feature-based fusion. However, large-scale networks are usually time and resource-consuming. Although current lightweight networks have improved efficiency by tailoring network size, image fusion quality and processing speed are still unsatisfactory especially when deployed on embedded systems. In this paper, we propose to use pixel-based fusion strategy to design simple yet efficient network that learns pixel-by-pixel weights adaptively for image fusion. Based on the proposed fusion network, we further combine it with a detection model to construct a joint optimization framework which optimizes low-level and high-level tasks cooperatively. The fusion quality and processing speed of various infrared and visible image fusion networks are evaluated on multiple datasets and platforms. The evaluation results demonstrate that, the proposed method yields comparable or even better quantitative and qualitative results than current state-of-the-art large-scale and lightweight networks. Besides, the joint optimization is effective for producing better fusion quality and detection accuracy. More importantly, the proposed fusion network only requires ∼27 ms to fuse a pair of infrared and visible images with the resolution of 512 × 512 on Jeston Xavier NX, which is only a third of the time required by the previous fastest network. Therefore, the proposed method is an efficient solution for real-time infrared and visible image fusion on embedded systems.
Read full abstract