Soil moisture is a significant variable in agricultural study and precision irrigation decision-making. It determines the soil water availability for plants, directly influencing plant growth, yield and quality. Owing to the variations in regional microclimate, landform difference, soil type and vegetation coverage, the soil moisture has strong spatial-temporal heterogeneity on a large regional scale. Micro-wave remote sensing can be used to invert soil moisture based on the dielectric constant under different weather conditions, while optical remote sensing utilizes spectral characteristics to estimate the physiological and ecological information of vegetation. In this study, two new hybrid models (ACO-RF and SSA-RF) were structured by optimizing the standalone random forest (RF) based on the ant colony optimization algorithm (ACO) and sparrow search algorithm (SSA), and six input combinations based on the multi-temporal Sentinel-1 and Landsat-8 remote sensing data from different sensors (optical, thermal and radar sensors) were used. The standalone RF, ACO-RF, and SSA-RF models with different combinations of inputs were employed to predict the soil moisture at different depths (5 cm, 10 cm, 20 cm, 40 cm) in a large-scale drip-irrigated citrus orchard. The results showed that the ACO-RF and SSA-RF outperformed the standalone RF model in terms of prediction accuracy at a depth of 0–40 cm, with R2 of 0.800–0.921 and 0.504–0.798, RRMSE of 7.214–16.284% and 11.124–22.214%, respectively. In the hybrid model, the ACO-RF model had better prediction accuracy than the SSA-RF model, with R2 of 0.805–0.921 and 0.800–0.911, RRMSE of 7.214–13.244% and 8.274–16.284%, respectively. At depths of 5 cm, 10 cm and 20 cm, the inversion accuracy of the model with microwave inputs was higher than that with multispectral inputs, with R2 of 0.556–0.888 and 0.541–0.886, RRMSE of 9.015–19.544% and 9.124–22.214%, respectively. However, at a depth of 40 cm, the inversion accuracy of the model with multispectral inputs was higher than that with microwave inputs, with R2 of 0.532–0.841 and 0.508–0.831, RRMSE of 9.124–21.021% and 9.142–21.214%, respectively. The model with multispectral, thermal, and microwave inputs exhibited the highest accuracy in predicting soil moisture, with R2 of 0.635–0.921, RRMSE of 7.214−18.564%, respectively. Therefore, the ACO-RF with multisource remote sensing data is recommended to predict the soil moisture in the drip-irrigated citrus orchard. This approach can provide data support for making intelligent irrigation decisions on a large-scale grid land lots.