Abstract

Rainfall is a crucial weather parameter in the context of Bangladesh. Prediction of rainfall can effectively aid the decision making process for agriculture and natural disaster management of the country. However the chaotic nature of rainfall due to climate change has made the task of rainfall prediction challenging through traditional statistical models. In this study, we analyze the performance of six machine learning algorithms: Decision Tree (DT), K-Nearest Neighbours (KNN), Random Forest (RF), Extreme Gradient Boosting (XGB), Light Gradient Boosting (LGB) and Multi-Layered Perceptron (MLP) in predicting daily rainfall as both regression and classification. In addition we try out an approach called Zero Inflated Regression (ZIR) to address the excessive amount of zero rainfall values in the dataset. The models were trained with and without feature selection and/or sampling techniques (for classification). During training 10-fold cross validation and hyperparameter tuning was performed on the train set and afterwards the selected models were applied to the test set for evaluation. For regression LGB with SelectKBest feature selection had the best performance on the test set with R2-score of 0.203, MAE of 6.40 and RMSE of 15.44. Among the classifiers, XGB with no feature selection and no sampling technique resulted with best test accuracy of 0.787 and test macro fl-score of 0.62. The ZIR model consisting of XGB classifier and LGB regressor with no feature selection yielded R2-score of 0.189, MAE of 5.789 and RMSE of 15.575 on the test set. Interestingly the ZIR models produced lower MAE compared to the regression models but the regression models had better R2-score.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call