The fusion of multispectral (MS) and panchromatic (PAN) images is of great significance for the construction of high-resolution remote sensing images. Because of differences in sensors, no single MS or PAN image can express the complete information of a scene. Therefore, it is a key issue to fuse MS images containing rich spectral content and PAN images with spatial information to construct a high-resolution MS image. In this work, an adaptive shuffle attention (ASA) module and an optimized UNet++ are combined in a fusion-UNet++ (F-UNet++) framework for the problem of MS and PAN image fusion. This ASA module can focus on important information in the mixed domain and adjust the dimensions of tensors. F-UNet++ includes a multiscale feature extraction module, multiscale feature fusion module, and image reconstruction module. The multiscale feature extraction module obtains spectral and spatial information, the multiscale feature fusion module fuses spectral and spatial information, and a composite multi-input image reconstruction module (CMI-UNet++) reconstructs the final image. By combining the ASA attention module, the loss of feature information can be reduced to enhance the fidelity of the spectral and spatial information of the fused image. Experiments show that F-UNet++ is qualitatively and quantitatively superior to current image fusion methods. (The code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/Josephing/F-UNet</uri> ).