ABSTRACT Pan-sharpening is the fusion of panchromatic (PAN) image and multispectral (MS) image through certain rules to generate high-resolution multispectral (HRMS) image with vivid spatial details and uniform spectral distribution. It has become an important technology in remote sensing image processing. Recently, convolutional neural networks based on deep learning have achieved remarkable results in the field of pan-sharpening. Guided by this method, this paper proposes a pan-sharpening network PSAM-NET for remote sensing images based on depth expansion combined with Cross-Attention Fusion. It consists of gradient projection and the main image fusion module. The gradient projection mainly generates the fusion module by stacking low-resolution multispectral images and panchromatic images alternately, and two deep prior regularized optimization problem formulas are respectively solved by the gradient projection algorithm. The other principal image fusion module realizes double branch fusion, which is mainly composed of cross attention fusion and channel attention fusion, to produce an excellent fusion effect. The simulation and real data experiments were performed on the standard datasets WV2, GF-2, and QB. The qualitative analysis and quantitative comparison with the classical pan-sharpening method proved that the spatial information of the image obtained by this method is complete. The spectral distribution is uniform, and the evaluation index also has some advantages.