Abstract

We introduce a method for selecting a small subset of informative, non-redundant predictors from a set of input variables, given an output variable. The core of this method is a novel measure of variable importance, which is an enhancement of the so-called “conditional permutation importance” (CPI). In CPI, the importance of an input variable is measured by the expected increase of a random forest (RF)’s prediction error when such variable is randomly permuted within certain groups of observations. While CPI obtains these groups from the stochastic recursive partitions that the RF carries out on the input space, our measure relies on a new approach that groups observations by means of a special form of clustering, which optimally leverages the structure of dependences existing between input variables. We show that our measure can be effectively used to recursively eliminate both unimportant and redundant input variables. Extensive experimental results illustrate the effectiveness of our method in comparison with many RF-based methods for variable selection.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.