Abstract

We develop a new approach for feature selection via gain penalization in tree-based models. First, we show that previous methods do not perform sufficient regularization and often exhibit sub-optimal out-of-sample performance, especially when correlated features are present. Instead, we develop a new gain penalization idea that exhibits a general local-global regularization for tree-based models. The new method allows for full flexibility in the choice of feature-specific importance weights, while also applying a global penalization. We validate our method on both simulated and real data, exploring how the hyperparameters interact and we provide the implementation as an extension of the popular R package ranger.

Highlights

  • In many Machine Learning problems, features can be hard or economically expensive to obtain, and some may be irrelevant or poorly linked to the target

  • For tree-based methods, there is no standard procedure for feature selection or regularization in the literature, as one would find for Linear Regression and the LASSO [2] for example

  • We provide a general gain penalization procedure for tree-based models, which allows for a combination of local and global regularization parameters

Read more

Summary

INTRODUCTION

In many Machine Learning problems, features can be hard or economically expensive to obtain, and some may be irrelevant or poorly linked to the target. In [5], the authors treat trees as parametric models and use procedures analogous to LASSO-type shrinkage methods, by penalizing the coefficients of the base learners and reducing the redundancy in each path from the root node to a leaf node. Their selected features can still be redundant, since the focus is on reducing the number of rules instead of the number of features. We provide a general gain penalization procedure for tree-based models, which allows for a combination of local and global regularization parameters.

PROBLEM SETUP
GENERALIZING GAIN PENALIZATION
DEPTH PARAMETER
EXPERIMENTS
GENERALIZED GAIN PENALIZATION IN RANDOM
IMPLEMENTATION
CONCLUSION AND NEXT STEPS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call