AbstractBy automatically learning the priors embedded in images with powerful modelling capabilities, deep learning‐based algorithms have recently made considerable progress in reconstructing the high‐resolution hyperspectral (HR‐HS) image. With previously collected large‐amount of external data, these methods are intuitively realised under the full supervision of the ground‐truth data. Thus, the database construction in merging the low‐resolution (LR) HS (LR‐HS) and HR multispectral (MS) or RGB image research paradigm, commonly named as HSI SR, requires collecting corresponding training triplets: HR‐MS (RGB), LR‐HS and HR‐HS image simultaneously, and often faces difficulties in reality. The learned models with the training datasets collected simultaneously under controlled conditions may significantly degrade the HSI super‐resolved performance to the real images captured under diverse environments. To handle the above‐mentioned limitations, the authors propose to leverage the deep internal and self‐supervised learning to solve the HSI SR problem. The authors advocate that it is possible to train a specific CNN model at test time, called as deep internal learning (DIL), by on‐line preparing the training triplet samples from the observed LR‐HS/HR‐MS (or RGB) images and the down‐sampled LR‐HS version. However, the number of the training triplets extracted solely from the transformed data of the observation itself is extremely few particularly for the HSI SR tasks with large spatial upscale factors, which would result in limited reconstruction performance. To solve this problem, the authors further exploit deep self‐supervised learning (DSL) by considering the observations as the unlabelled training samples. Specifically, the degradation modules inside the network were elaborated to realise the spatial and spectral down‐sampling procedures for transforming the generated HR‐HS estimation to the high‐resolution RGB/LR‐HS approximation, and then the reconstruction errors of the observations were formulated for measuring the network modelling performance. By consolidating the DIL and DSL into a unified deep framework, the authors construct a more robust HSI SR method without any prior training and have great potential of flexible adaptation to different settings per observation. To verify the effectiveness of the proposed approach, extensive experiments have been conducted on two benchmark HS datasets, including the CAVE and Harvard datasets, and demonstrate the great performance gain of the proposed method over the state‐of‐the‐art methods.