Hyperspectral image (HSI) and multispectral image (MSI) fusion aims to generate high spectral and spatial resolution hyperspectral image (HR-HSI) by fusing high-resolution multispectral image (HR-MSI) and low-resolution hyperspectral image (LR-HSI). However, existing fusion methods encounter challenges such as unknown degradation parameters, and incomplete exploitation of the correlation between high-dimensional structures and deep image features. To overcome these issues, in this article, an unsupervised blind fusion method for LR-HSI and HR-MSI based on Tucker decomposition and spatial-spectral manifold learning (DTDNML) is proposed. We design a novel deep Tucker decomposition network that maps LR-HSI and HR-MSI into a consistent feature space, achieving reconstruction through decoders with shared parameters. To better exploit and fuse spatial-spectral features in the data, we design a core tensor fusion network (CTFN) that incorporates a spatial-spectral attention mechanism for aligning and fusing features at different scales. Furthermore, to enhance the capacity to capture global information, a Laplacian-based spatial-spectral manifold constraint is introduced in shared-decoders. Sufficient experiments have validated that this method enhances the accuracy and efficiency of hyperspectral and multispectral fusion on different remote sensing datasets. The source code is available at https://github.com/Shawn-H-Wang/DTDNML.