The reference evapotranspiration (ETo) is a key parameter in achieving sustainable use of agricultural water resources. To accurately acquire ETo under limited conditions, this study combined the northern goshawk optimization algorithm (NGO) with the extreme gradient boosting (XGBoost) model to propose a novel NGO-XGBoost model. The performance of this model was evaluated using meteorological data from 30 stations in the North China Plain and compared with XGBoost, random forest (RF), and k nearest neighbor (KNN) models. An ensemble embedded feature selection (EEFS) method combined with the results from RF, XGBoost, adaptive boosting (AdaBoost), and categorical boosting (CatBoost) models is used to obtain the importance of meteorological factors in estimating ETo, and thereby determine the optimal combination of inputs to the model. The results indicated that by using the top 3, 4, and 5 important factors as input combinations, all models achieved high ETo estimation accuracy. It is worth noting that there were significant spatial differences in the estimation precisions of the four models, but the NGO-XGBoost model exhibited consistently high estimation precisions, with global performance indicator (GPI) rankings of 1st, and the range of coefficient of determination (R2), nash efficiency coefficient (NSE), root mean square error (RMSE), mean absolute error (MAE) and mean bias error (MBE) were 0.920–0.998, 0.902–0.998, 0.078–0.623 mm d−1, 0.058–0.430 mm d−1, and −0.254–0.062 mm d−1, respectively. Furthermore, the accuracy of the NGO-XGBoost model in estimating ETo varied across different seasons, which was more significantly affected by humidity and wind speed in winter. When the target station data was insufficient, the NGO-XGBoost model was trained by using the historical data from neighboring stations and still maintained a high precision. Overall, this study recommends a reliable method for estimating ETo, which provides a reference for accurately calculating ETo in the North China Plain in the absence of meteorological data.