Abstract
Methods in machine learning are very helpful to solve various problems, especially those related to large data. The mortality rate is one of the problems related to large data which is fluctuating depending on the factors that influence it. One of the factors that affect the mortality rates is air quality. Methods that can be used to predict the mortality rate of a population are Random Forest and Extreme Gradient Boosting (XGBoost), which is an ensemble method with decision trees as the basic model. The missing values in the data used to cause the low level of accuracy. In this paper, we discuss how to handle missing values and comparing the accuracy level of ensemble methods that we used to predict the mortality rate. By the simulation results, it shown that handle the missing values in the data is best overcome by removing the missing values (Drop NaN). Mean Square Error (MSE) value generated by the Random Forest and XGBoost methods are 0.007239 ± (1.699 x 10−7) and 0.04019. Based on the MSE values of both methods, Random Forest gives better accuracy than XGBoost to predict mortality rate affected by air quality.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.