Abstract

In this paper we propose a novel approach for feature selection in machine learning. The approach is based on the Sobol sensitivity analysis, a variance-based technique that determines the contribution of each feature and their interactions to the overall variance of the target variable. Similar to wrappers, Sobol sensitivity is a model-based approach that utilizes the trained model to evaluate feature importances. It uses the full feature set to train the model just as embedded methods do. Based on the trained model, it evaluates importance scores and, similar to filters, identifies the subset of important features with highest scores without retraining the model. The distinctive characteristic of the Sobol sensitivity approach is its computational efficiency compared to the existing feature selection algorithms. This is because importance scores for all individual features and subsets of features are calculated with the same trained model. We apply the proposed algorithm to a simulated data set and to four benchmark data sets used in machine learning literature. The results are compared to those obtained by two of the widely used feature selection algorithms and some computational aspects are also discussed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call