Abstract

Impurity functions are crucial in decision trees. These functions help determine the impurity level of a node in a decision tree, guiding the splitting criteria. However, two primary ambiguities have surrounded impurity functions: (1) the question of their non negativity and (2) the debate over their concavity. In this paper, we address these uncertainties by delving into the characteristics of impurity functions. We establish that the non negativity of an impurity function is inconsequential. Through counter examples, we disprove the equivalence between an impurity function and a concave function. We identify an impurity function that is not concave and a concave function that is not an impurity function. Interestingly, we find an impurity function that results in a negative impurity reduction. Furthermore, we validate several significant properties of impurity functions. For example, we demonstrate that when an impurity function is concave, the impurity reduction remains nonnegative for multiway divisions. We also discuss the sufficient conditions for a concave function to be an impurity function. Our numerical results further indicate that a positive linear combination of the two most popular impurity functions, namely Gini Index and Entropy, may surpass the individual performance of each when applied to the well-known German credit dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call