High-resolution (HR) remote sensing (RS) imaging opens the door to very accurate geometrical analysis for objects. However, it is difficult to simultaneous use massive HR RS images in practical applications, because these HR images are often collected in different multimodal conditions (multisource, multiarea, multitemporal, multiresolution, and multiangular) and learning method trained for one situation is difficult to use for others. The key problem is how to simultaneously tackle three main problems: spectral drift, spatial deformation, and band inconsistency. To deal with these problems, we propose an unsupervised tensorized principal component alignment framework in this paper. In this framework, local spatial–spectral patch data are used as basic units in order to achieve simultaneously multidimensional alignment. This framework seeks a domain-invariant tensor feature space by learning multilinear mapping functions which align the source tensor subspace with the target tensor subspace on different dimensions. In addition, an approach based on the Mahalanobis distance for dimensionality estimation of tensor subspace is proposed to determine best sizes of the aligned tensor subspace for reducing computational complexity. HR images from GF-1, GF-2, DEIMOS-2, WorldView-2, and WorldView-3 satellites are used to evaluate the performance. The experimental results show the following two points: first, the proposed alignment framework for multimodal HR images not only can align the different multimodal data more accurately than existing state-of-the-art domain adaptation methods, but also has a fast and simple procedure for large-scale data situation which is caused by HR imaging. Second, the proposed tensor dimensionality estimation method is an efficient technology for seeking the intrinsic dimensions of high-order data.