Abstract

In this paper, we propose a multistage fusion network to solve the infrared and visible image fusion (IVIF) problem. Unlike other deep learning methods, our network architecture is designed to the complex balance between spatial details and contextualized information. In order to optimally balance these competing goals, our main proposal is a multistage architecture that progressively learns IVIF functions for the source images (infrared and visible images), thereby breaking down the overall recovery process into more manageable steps. Specifically, our model first learns the contextural features using the encoder-decoder architecture with downsampling operations and later combines them with a full-resolution branch that retains local details. Between stages, we introduce a cross-stage fusion module (CSFM) to propagate multiscale contextual features from an earlier stage to a later stage. In addition, we introduce an upsampling module that can conquer both checkerboard artifacts and blurring effect by a bilinear interpolation operation followed by a deformable convolution. The resulting tightly interlinked multistage fusion network, named MSFNet, demonstrates the superiority of our method over state-of-the-art performance on publicly available datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.