Abstract

Feature selection (FS) is a very important pre-processing technique in machine learning and data mining. It aims to select a small subset of relevant and informative features from the original feature space that may contain many irrelevant, redundant and noisy features. Feature selection usually leads to better performance, interpretability, and lower computational cost. In the literature, FS methods are categorized into three main approaches: Filters, Wrappers, and Embedded. In this paper we introduce a new feature selection method called graph feature selection (GFS). The main steps of GFS are the following: first, we create a weighted graph where each node corresponds to each feature and the weight between two nodes is computed using a matrix of individual and pairwise score of a Decision tree classifier. Second, at each iteration, we split the graph into two random partitions having the same number of nodes, then we keep moving the worst node from one partition to another until the global modularity is converged. Third, from the final best partition, we select the best ranked features according to a new proposed variable importance criterion. The results of GFS are compared to three well-known feature selection algorithms using nine benchmarking datasets. The proposed method shows its ability and effectiveness at identifying the most informative feature subset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call