The random forest model is a powerful supervised learner, recognized for its ability to learn the pattern within data with superior predictive accuracy. However, it is a black box model because it lacks interpretability. This study addressed the interpretable challenge by employing the inTree framework. The rules were extracted from each decision tree in a random forest model, and the association rules were determined through measured matrix support and confidence to reveal the frequent variable interactions for predicting unemployment. This approach provided insight into the relationships between specific variables and unemployment outcomes. The developed method used data set from the integrated labor force survey (ILFS) 2020/2021 in Zanzibar. Zanzibar’s unemployment rate consistently increased across surveys conducted in 2006, 2014, and 2020/2021. Results have shown that the rules that most predict unemployment for individuals are female and lack of health insurance and secondary education level, female and youth age group and lack of health insurance and secondary education level with a high confidence level. This study provides practical insights for Zanzibar’s government to develop effective interventions, programs, and policies. Improving the interpretability of the random forest model enhances decision-making to address unemployment challenges.
Read full abstract