Abstract

The prediction of rain duration based on data from the Meteorology, Climatology, and Geophysics Agency (BMKG) is an important issue but remains an open problem. At the same time, several studies have shown that missing values can cause a decrease in the performance of the model in making predictions. This study proposes k-nearest neighbors (KNN) imputation to overcome the problem of missing values in predicting rain duration. The source of the rain duration prediction dataset is the BMKG data. We compared gradient boosting regression (GBR), adaptive boosting regression (ABR), and linear regression (LR) for the regression model for predicting rain duration. We compared the KNN imputation method with several benchmark methods, including zero imputation, mean imputation, and iterative imputation. Parameters r2, mean squared error (MSE) and mean bias error (MBE) measure the performance of these imputation methods. The test results show that for rain duration prediction using the regression method, GBR shows the best performance, both for train data and test data with r2 = 0.915 and 0.776, respectively. Then our proposed KNN imputation has the best performance for missing value imputation compared to the benchmark imputation method. The prediction values of r2 and MSE when using KNN imputation at Missing Percentage = 90% are 0.71 and 0.36, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.