The article proposes to use machine learning as one of the areas of artificial intelligence to forecast the volume of biogas production from household organic waste. The use of five regression algorithms (Linear Regression, Ridge Regression, Lasso Regression, Random Forest Regression, and Gradient Boosting Regression) to create an effective model for forecasting the volume of biogas production from household organic waste is considered. Based on the comparison of these algorithms by MSE and MAE indicators, the quality of training and their accuracy during forecasting are evaluated. The proposed algorithm for creating a model for forecasting biogas production volumes from household organic waste involves the implementation of 10 main and 3 auxiliary steps. Their advantage is that they aid in the performance of component data analysis, which is carried out based on the method of reducing the dimensionality of the data set, increasing interpretability, and minimizing the risk of data loss. An analysis of 2433 data is was carried out, which characterizes the formation of biogas from food (FW) and yard waste (YW) according to four features. Data preparation is performed using the Jupyter Notebook environment in Python. We select five machine learning algorithms to substantiate an effective model for forecasting volumes of biogas production from household organic waste. On the basis of the conducted research, the main advantages and disadvantages of the used algorithms for building forecasting models of biogas production volumes from household organic waste are determined. It is found that two models, “Random Forest Regressor” and “Gradient Boosting Regressor”, show the best accuracy indicators. The other three models (Linear Regression, Ridge Regression, Lasso Regression) are inferior in accuracy and were not considered further. To determine the accuracy of the “Random Forest Regressor” and “Gradient Boosting Regressor” models, we choose the MSE and MAE indicators. The Random Forest Regressor model is found to be a more accurate model compared to the Gradient Boosting Regressor. This is confirmed by the fact that the MSE of the “Random Forest Regressor” model on the training data set is 7.14 times smaller than that of the “Gradient Boosting Regressor” model. At the same time, MAE is 2.67 times smaller in the “Random Forest Regressor” model than in the “Gradient Boosting Regressor” model. The MSE and MAE of both models are worse on the test data set, which indicates overtraining tendencies. The Gradient Boosting Regressor model has worse MSE and MAE than the Random Forest Regressor model on both the training and test data sets. It is established that the model based on the “Random Forest Regressor” algorithm is the most effective for forecasting the volume of biogas production from household organic waste. It provides MAE = 0.088 on test data and the smallest absolute errors in predictions. Further systematic improvement of the “Random Forest Regressor” model for forecasting biogas production volumes from household organic waste based on new data will ensure its accuracy and maintain competitive advantages.
Read full abstract