Downscaling land surface temperatures (LST) from satellite imagery is essential for many fine-scale applications. However, the accuracy of the downscaling is often limited by different environmental and geographical conditions. In this work, a novel LST downscaling framework is proposed to improve the accuracy, especially for heterogeneous areas with varying land covers and complex terrains. The framework focuses on downscaling the MODIS LST from 1 km to 100 m, using the proposed geographically and temporally neural network weighted autoregression (GTNNWAR) model with spatio-temporal fused scaling factors derived from Landsat 8 imagery and digital surface models (DSM). To tackle the issues of the non-stationarity of the scaling factors in heterogenous areas, a region-adaptive parameterization approach is first applied. Then, the GTNNWAR invokes a two-stage deep neural network to estimate the regression coefficients, resulting in the adaption of varying weights for the scaling factors to raise the prediction performance. Moreover, the GTNNWAR is incorporated with a spatial autoregressive model which intakes the neighbor effects so that the overall accuracy can be further improved. Prior to the actual downscaling with the GTNNWAR, a filter-based fusion method is applied to ensure the spatio-temporal consistency of scaling factors is high enough for the neural networks to converge. The results suggest that the proposed framework exhibits high accuracy at the boundaries of different land covers and complex terrains. Compared with several other downscaling algorithms in three case study areas (Beijing and Zhangye in China, Netherlands–Germany in Europe), the proposed framework outperforms with a 28% improved R-squared (R2) and a root mean square error (RMSE) of 1.02 K. In addition, the downscaled LST has R2 over 0.63 for the UAV observations (Guangdong). It is concluded that the proposed framework has high reliability and robustness to provide LST datasets with high spatio-temporal resolutions in a wide range of land types.