Abstract

As a popular machine learning model, decision trees classify and generalize well, but face challenges in engineering applications: 1) Sensitivity to perturbations and lack of interpretability due to correlation reliance. 2) Manual setting of stopping criterion which is unrelated to correlation strength and easily leads to over-partitioning. To address these two challenges, we first theoretically analyze what leads to sub-optimal decision trees. By incorporating causal discovery, this limitation can be attributed to the fact that trees grown with spurious correlations often fall into sub-optimal that lead to overfitting and unfair behaviors. Neglecting causality motivates us to develop a ‘better’ tree with low Kolmogorov complexity and high generalization capability. Then we propose a causality decision tree framework, CausalDT, based on our theoretical expectation, where Hilbert-Schmidt independence criterion serves as a baseline. Unlike previous approaches that prioritize relevance, our framework determines branch nodes based on causation between features, with the significance level determining whether the tree should be expanded further. Experimental results demonstrate that our model maintains performance while reducing average tree depth by 35% on various datasets. Furthermore, our model enhances decision fairness and interpretability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call