Addressing the issue of carbon emissions in the transportation sector, this research constructed various predictive models using multiple machine learning algorithms based on panel data from 30 provinces in China from 2005 to 2019. The study aimed to identify the optimal machine learning algorithm and key factors influencing the carbon emissions of transportation, providing potent references for policymakers and decision-makers to reduce carbon emissions and promote the sustainable development of the transportation sector. Initially, drawing from the concept of the fixed effects model, we included the heterogeneity differences among provinces as an important factor. We further employed a combined method of Pearson's correlation coefficient and Spearman's rank correlation coefficient to screen 18 factors influencing transportation carbon emissions. We then made a preliminary selection of seven common machine learning algorithms and used the screened factors as explanatory variables for model training. The three algorithms with the best performance were further optimized and trained. Subsequently, we utilized the K-fold cross-validation method; plotted learning curves to test the performance of each predictive model; and used MSE, MAE, R2, and MAPE as evaluation indicators to determine the best predictive model. SHAP values were chosen to calculate the importance of each explanatory variable in the optimal predictive model. The results indicated that the multicollinearity among the seven factors of provincial differences, total consumption of social goods, urban green space area, freight turnover, number of private cars, transportation industry output, and permanent population was weak, and all passed the significance test. They could be used as explanatory variables in the prediction model of transportation carbon emissions. The prediction results of the Random Forest and XGBoost algorithms were both outstanding, with R2 values above 0.97 and errors below 10 %, showing no signs of overfitting or underfitting. Among them, the XGBoost algorithm performed the best, whereas the KNN algorithm performed poorly. The importance ranking of the explanatory variables was as follows:provincial differences > total consumption of social goods > number of private cars > permanent population > freight turnover > urban green space area > transportation industry output. A comprehensive analysis of relevance and importance showed that provincial differences were an indispensable variable in the prediction of transportation carbon emissions. In conclusion, this study provides a new approach to the governance of carbon emissions in the transportation industry, and the results can serve as a reference for policymakers and decision-makers. In future policy design and decision-making, the distinctive factors of each province should not be overlooked. Measures targeted at specific regions need to be formulated to promote the sustainable development of the transportation industry.
Read full abstract