Abstract

The complexity and high dimensionality are the inherent concerns of big data. The role of feature selection has gained prime importance to cope with the issue by reducing dimensionality of datasets. The compromise between the maximum classification accuracy and the minimum dimensions is as yet an unsolved puzzle. Recently, Monte Carlo Tree Search (MCTS)-based techniques have been invented that have attained great success in feature selection by constructing a binary feature selection tree and efficiently focusing on the most valuable features in the features space. However, one challenging problem associated with such approaches is a tradeoff between the tree search and the number of simulations. In a limited number of simulations, the tree might not meet the sufficient depth, thus inducing biasness towards randomness in feature subset selection. In this paper, a new algorithm for feature selection is proposed where multiple feature selection trees are built iteratively in a recursive fashion. The state space of every successor feature selection tree is less than its predecessor, thus increasing the impact of tree search in selecting best features, keeping the MCTS simulations fixed. In this study, experiments are performed on 16 benchmark datasets for validation purposes. We also compare the performance with state-of-the-art methods in literature both in terms of classification accuracy and the feature selection ratio.

Highlights

  • With the abundance of huge data around, more sophisticated methods are required to handle it.Among the class of different techniques, feature selection is one that has gained much attention by the researchers, mainly because of the high dimensionality of big datasets

  • We extend the idea of MOTiFS and propose a recursive framework to take the full advantage of tree search for optimal feature selection

  • The idea is based on the intuition that the state space of every successor feature selection tree is smaller than that of its predecessor, increasing the impact of tree search in selecting best features, keeping the Monte Carlo Tree Search (MCTS) simulations fixed during each recursion

Read more

Summary

Introduction

With the abundance of huge data around, more sophisticated methods are required to handle it. The search tree might not meet the sufficient depth in a limited number of MCTS simulations, inducing bias towards randomness in feature subset selection. This intuition urged us and served as a catalyst for this study. The idea is based on the intuition that the state space of every successor feature selection tree is smaller than that of its predecessor, increasing the impact of tree search in selecting best features, keeping the MCTS simulations fixed during each recursion. The algorithm starts with the full feature set F as an initial input and builds various feature selection trees in a series, each producing the best feature subset (Fbest ) as an output after S MCTS simulations.

Related Work
The Recursive Procedure
Feature
Feature Subset Generation
Reward Calculation and Backpropagation
Datasets
Experimental Setting
Comparison with MOTiFS and H-MOTiFS
Comparison with State-Of-The-Art Methods
Non-Parametric Statistical Tests
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call