Recently, deep learning (DL) approaches have been widely applied to the pansharpening problem, which is defined as fusing a low-resolution multispectral (LRMS) image with a high-resolution panchromatic (PAN) image to obtain a high-resolution multispectral (HRMS) image. However, most DL-based methods handle this task by designing black-box network architectures to model the mapping relationship from LRMS and PAN to HRMS. These network architectures always lack sufficient interpretability, which limits their further performance improvements. To address this issue, we adopt the model-driven method to design an interpretable deep network structure for pansharpening. First, we present a new pansharpening model using the convolutional sparse coding (CSC), which is quite different from the current pansharpening frameworks. Second, an alternative algorithm is developed to optimize this model. This algorithm is further unfolded to a network, where each network module corresponds to a specific operation of the iterative algorithm. Therefore, the proposed network has clear physical interpretations, and all the learnable modules can be automatically learned in an end-to-end way from the given dataset. Experimental results on some benchmark datasets show that our network performs better than other advanced methods both quantitatively and qualitatively.