Accurate estimation of fine particulate matter (PM2.5) concentrations can provide necessary data sources for epidemiological studies and environmental management decisions. However, the different sources and resolutions of satellite data and the limited number of monitoring stations decrease the accuracy of PM2.5 concentration estimation, which subsequently complicates the continuous estimation of high-resolution PM2.5. In this study, we evaluate and optimize five different machine learning methods, including cubist, random forest (RF), support vector machine (SVM), the generalized additive model (GAM), and Extreme Gradient Boosting (XGBoost), to perform spatial high-resolution (1 km) estimations of PM2.5 concentrations. Based on the optimization, two model fusion methods, i.e., the Bayesian model averaging and Granger–Ramanathan averaging algorithms, are applied and compared for estimating PM2.5 concentrations. Among the five single machine learning models, the RF model performed the best, with a coefficient of determination (R2) of 0.84, root mean square error (RMSE) of 5.09 μg/m3, and ratio of performance to interquartile distance (RPIQ) of 3.40. The cubist model ranks second in terms of performance (R2 = 0.82, RMSE = 5.37 μg/m3, and RPIQ = 3.23). The estimation accuracies of the five models decrease in the following order: RF > cubist > XGBoost > SVM > GAM. The fusion model performed better than the single model and reliably estimated the PM2.5 concentration (R2 = 0.86, RMSE = 5.04 μg/m3, and RPIQ = 3.44). The results show that the model fusion algorithms improved the overall performance of fine-level and long-term PM2.5 concentration estimation, and that the accuracies of the fusion models are significantly higher than those of the five machine learning models used individually. Fusion model is used to generate an annual PM2.5 concentration distribution map of China from 2017 to 2021. This study provides a reference for estimating PM2.5 concentrations and contributes to the formulation of targeted policies.