Remote sensing applications, such as detection and recognition of objects, need a large number of high spatial resolution (HR) images. As a feasible technique, image fusion is considered to produce HR images. However, most of remote sensing image fusion methods based on deep neural network (DNN) are mainly limited by the extraction of spatial and spectral features. These methods ignore the redundancy among these features, which causes spatial and spectral distortions in the fused image. In this paper, we propose a novel image fusion method based on a triple disentangled network (TDNet) with dual attention to reduce the redundancy in the extracted features. In the proposed method, it is assumed that the information in panchromatic (PAN) and low spatial resolution multispectral (LRMS) images can be encoded as the spatial, spectral, and common features. Specifically, we construct a triple-stream network to extract these features. To efficiently model the spatial and spectral information in PAN and LRMS images, local–global attention and interdependency attention are designed and integrated into the network. Then, the redundancy among these features is reduced by disentangled learning, in which these features are recombined to reconstruct the PAN and LRMS images. Besides, these features should complement each other. So, we utilize the maximal coding rate reduction to balance the redundancy and complementarity among them. Finally, all features are recombined to synthesize the high spatial resolution multispectral image. The experimental results demonstrate that the proposed TDNet has a superior performance in terms of qualitative and quantitative evaluations. The code link is https://github.com/RSMagneto/TDNet.