Abstract

The gross calorific value (GCV) of coal is an important parameter for evaluating coal quality, and regression analysis methods can be used to predict GCV. In this study, we proposed a GCV prediction model based on cubist regression. To develop a good regression model, feature selection of input variables was performed using a correlation analysis and a recursive feature elimination algorithm. Thus, in this study, we determined three sets of variables as the optimal combination for regression models: proximate analysis variables (Set 1: moisture, standard ash, and volatile matter), element analysis variables (Set 2: carbon, sulfur, and oxygen), and comprehensive index variables (Set 3: carbon, volatile matter, standard ash, sulfur, moisture, and hydrogen). Results for comparison with multiple linear regression, random forest regression, and numerous previous prediction models, such as gradient boosting regression tree, support vector regression (SVR), backpropagation neural networks, and particle swarm optimization-artificial neural network (PSO-ANN), indicate that these seven regression models have the best fitting effect on the comprehensive index variables among the three sets of input variables. The cubist model showed higher prediction accuracy and lower error than most other models (R2, mean absolute error, root mean square error, and average absolute relative deviation percentage values are 0.990, 0.476, 0.668, and 0.086% for the proximate analysis variables; 0.992, 0.381, 0.596, and 0.140% for element analysis variables; and 0.999, 0.161, 0.219, and 0.087% for comprehensive index variables, respectively). The cubist model combines the advantages of decision tree and linear regression, which not only enables it to perform well in terms of accuracy but also makes the model highly interpretable because it is based on multiple sublinear equations. In addition, the cubist model shows obvious advantages in terms of running speed, especially compared with SVR and PSO-ANN, which require complex parameter optimization. In summary, the cubist model considers the prediction accuracy, model interpretability, and computational efficiency as well as provides a new and effective method for GCV prediction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.