Spatially continuous surface air temperature (SAT) is of great significance for various research areas in geospatial communities, and it can be reconstructed by the SAT estimation models that integrate accurate point measurements of SAT at ground sites with wall-to-wall datasets derived from remotely sensed observations of spaceborne instruments. As land surface temperature (LST) strongly correlates with SAT, estimation models are typically developed with LST as a primary input. Geostationary satellites are capable of observing the Earth’s surface across large-scale areas at very high frequencies. Compared to the substantial efforts to estimate SAT at daily or monthly scales using LST derived from MODIS, very limited studies have been performed to estimate SAT at high-temporal scales based on LST from geostationary satellites. Estimation models for hourly SAT based on the LST derived from FY-4A, the first geostationary satellite in China’s new-generation meteorological observation mission, were developed for the first time in this study. The models were fully cross-validated for a very large-scale region with diverse geographic settings using random forest, and specified differently to explore the influence of time and location variables on model performance. Overall predictive performance of the models is about 1.65–2.08 K for sample-based cross-validation, and 2.22–2.70 K for site-based cross-validation. Incorporating time or location variables into the hourly models significantly improves predictive performance, which is also confirmed by the analysis of predictive errors at temporal scales and across sites. The best-performing model with an average RMSE of 2.22 K was utilized for reconstructing maps of SAT for each hour. The hourly models developed in this study have general implications for future studies on large-scale estimating of hourly SAT based on geostationary LST datasets.