Abstract
Purpose: This study aims to compare the performance of ensemble trees such as Random Forest (RF) and Double Random Forest (DRF) from view points of interpretability of the models. Both models have strong predictive performance but the inner working of the models is not human understandable. Model interpretability is required to explain the relationship between the predictors and the response. We apply association rules to simplify the essence of the models.Methods: This study compares interpretability of RF and DRF using association rules. Each decision tree formed from each model is converted into if-then rules by following the path from root node to leaf nodes. The data was selected in such a way that they were underfit data. This is due to the fact that DRF has been shown by other researchers to overcome the underfitting problem faced by RF. A Simulation study has been conducted to evaluate the extracted rules from RF and DRF. The rules extracted from both models are compared in terms of model interpretability based on support and confidence values. Association rules may also be applied to identify the characteristics of poor people who are working in Yogyakarta.Result: The simulation results revealed that the interpretability of DRF outperformed RF especially in the case of modelling underfit data. On the other hand, using empirical data we have been able to characterize the profile of poor people who are working in Yogyakarta based on the most frequent rules.Novelty: Research on interpretable DRF is still rare, especially the interpretation model using association rules. Previous studies focused only on interpreting the random forest model using association rules. In this study, the rules extracted from the random forest and double random forest models are compared based on the quality of the rules extracted.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have