Abstract

This paper investigates the performance of ensemble boosting trees in forecasting volatility of China's crude oil futures by combining rich feature variables and multiple volatility forecasting models. The empirical results demonstrate that ensemble boosting tree models significantly outperform the HAR-RV model and traditional machine learning models, with the CatBoost and the LightGBM having the best forecasting performance, and that these conclusions hold up under robustness tests. Using the SHAP values model interpretability instrument, this paper analyzes the model interpretability of LightGBM and CatBoost in terms of the drivers of volatility forecasting, the contribution of variables in a specific period, and the performance of variables in forecasting outliers. It is discovered that macroeconomic variables and HAR-type variables have different forecasting contributions in CatBoost and LightGBM, and that the contribution of different variables to the forecasting window varies significantly within a single interval. In addition, the paper concludes that there is heterogeneity in the forecast contribution of the same predictor across models, so the selection of variables for forecasting volatility should be based on the actual situation. Lastly, additional analysis confirms that the ensemble boosting tree models also have a high economic value.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call