ABSTRACT Pansharpening is an important technology for obtaining high-resolution multispectral (HRMS) images by fusing low-resolution multispectral (LRMS) images and high-resolution panchromatic (PAN) images. Although many pansharpening models have emerged by taking advantage of deep learning (DL) technology, there remains a pressing need to further assess pansharpening accuracy and stability when LRMS images with complex land-cover types. What’s more, these models often overlook the exploitation of PAN images’ inherent high-frequency information. To address these issues, we propose a pansharpening model combining multi-level and multi-scale network architectures. The multi-level network architecture is used to build spatial-spectral dependence on LRMS-PAN pairs, and strengthen the network’s feature capture capability by keeping the multi-level texture details. The multi-scale architecture is subsequently used to extract the spatial structure and deep texture of the PAN images at different scales. Downsampled experiments and real experiments in four standard datasets show that the proposed model achieves a state-of-the-art performance.