ABSTRACT Pan-sharpening aims to obtain high resolution multispectral (HRMS) images by integrating the information in the panchromatic and multispectral images. Existing pan-sharpening methods have demonstrated impressive sharpening performance. However, these methods inherently overlook the complementary characteristics and interaction between diverse source images, resulting in sharpened outcomes accompanied by distortion. To solve the above problems, we construct a novel cross fusion model with fine-grained detail reconstruction from the perspective of frequency-domain. The motivation of the model is twofold: (1) to reconstruct spatial detail representations from diverse source images, laying the foundation for the generation of fine details in the subsequent fused images; and (2) to enhance the interaction between diverse source features during the fusion process in order to attain high-fidelity fusion outcomes. Based on the theoretical model, we develop a frequency-spectral dual domain cross fusion network (CF2N) utilizing the deep learning technique. Consequently, the CF2N consist of two main stages, namely frequency-domain dominated detail reconstruction (FD2R) and frequency-spectral cross fusion (FSCF). Specifically, a more reasonable reconstruction of fine frequency details in HRMS can be achieved by performing adaptive weighted fusion of frequency details in the FD2R stage. Furthermore, the FSCF module, which seamlessly integrates frequency- and spectral-domain details in a highly interactive cross fusion manner. As a result, the CF2N possesses the capability to attain high frequency-spectral fidelity results with excellent interpretability. Extensive experiments show the superior performance of ours over state of the art, while maintaining high efficiency. All implementations of this work will be published at our website.