Random Forest (RF) machine learning models have emerged as a prominent algorithm, addressing problems arising from the sole use of decision trees, such as overfitting and instability. However, conventional RF has global coverage that may need to capture spatial variations better. Based on the analysis of the level of public health development, the relationship between the level of health development and risk factors can vary spatially. We use a modified RF algorithm called Geographically Weighted Random Forest (GW-RF) to address this challenge. GW-RF, as a tree-based non-parametric machine learning model, can help explore and visualize relationships between the Public Health Development Index (PHDI) as response variables and factors that are indicators at the district level. GW-RF output is compared with global output, which is RF in 2018 using the percentage of the population with access to clean/decent water (X1), consumption of eggs and milk per capita per week (X2), number of healthcare facilities per 1000 people (X3), number of doctors per 1000 people (X4), pure participation rate ratio female/male (X5), percentage of households that have hand washing facilities with soap and water (X6) as independent variables. Our results show that the non-parametric GW-RF model shows high potential for explaining spatial heterogeneity and predicting PHDI versus a global model when including six major risk factors. However, some of these predictions mean little. Findings of spatial heterogeneity using GW-RF show the need to consider local factors in approaches to increasing PHDI values. Spatial analysis of PHDI provides valuable information for determining geographic targets for areas whose PHDI values need to be improved.
Read full abstract