A Comparative Analysis of Machine Learning Algorithms for Predicting Paddy Production

Nanda Aditya,Ibnu Rasyid Munthe,Volvo Sihombing

doi:10.33395/sinkron.v8i2.13666

Abstract

For countries with large populations, such as Indonesia, food security is a very important issue. The majority of Indonesia's population depends on rice as their main food, and paddy is one of the most widely cultivated food commodities. The very good and accurate national paddy production prediction results really support decisions regarding national paddy production targets for the coming period. Therefore, to ensure supply and price stability, paddy availability must be predicted. Many studies have used machine learning to predict crop yields. By learning important patterns and relationships from input data, machine learning can combine the advantages of other methods to make better predictions of paddy yields. The aim of this research is to conduct a comparative analysis between three machine learning algorithms, namely, random forest, decision tree, and k-nearest neighbors, in predicting paddy production. To determine which algorithm is the best, a model evaluation is carried out using the coefficient of determination (R2-score), mean absolute error (MAE), and mean squared error (MSE). This research goes through methodological stages, starting from collecting datasets, data preprocessing, training and testing split datasets, applying algorithms, and evaluating the model. From this research, results were obtained for the random forest algorithm with an R2-score of 82.38%, MAE of 261726.20, and MSE of 2.19495E+11. For the decision tree, the R2-score was 79.62%, MAE was 323257.99, and MSE was 2.49304E+11. Meanwhile, k-nearest neighbors obtained an R2-score of 76.25%, MAE of 318433.42, and MSE of 2.90577E+11. The results of this research show that the random forest algorithm is the best for predicting paddy production because it obtains a larger R2-score as well as smaller MAE and MSE results.

Full Text