Abstract
Feature selection is the preliminary step in machine learning and data mining. It identifies the most important and relevant features within a dataset by eliminating the redundant or irrelevant features. The substantial benefits may include an improved performance in terms of high prediction accuracy, reduced computational complexity, and simply interpretable underlying models. In this paper, we present a novel framework to investigate and understand the importance of Monte Carlo tree search (MCTS) in feature selection for very high-dimensional datasets. We construct a binary feature selection tree where each node represents one of the two feature states: a feature is selected or not. The search starts with an empty root node reflecting that no feature is selected. Then, the search tree is expanded by adding nodes in an incremental fashion through MCTS-based simulations. Following tree and default policy, every iteration generates an initial feature subset, where a filter is used to select the top k features forming the candidate feature subset. The classification accuracy is used as the goodness or reward of the candidate feature subset and propagated backward up to the root node following the active path. Finally, the candidate subset with highest reward is selected as the best feature subset. Experiments are performed on 30 real-world datasets, including 14 very high-dimensional microarray datasets, and results are also compared with state-of-the-art methods in the literature, which proves the efficacy, validity, and significance of the proposed method.
Highlights
In the present era of big data, most of the datasets are high dimensional ranging from few hundreds to thousands of features
Monte Carlo tree search (MCTS) is deployed in conjunction with the hybrid of filterwrapper methods
H-MOTiFS (HYBRID-MONTE CARLO TREE SEARCH BASED FEATURE SELECTION) We propose a novel framework based on MCTS, to deal with very high dimensional datasets with an objective to achieve high accuracy with reduced dimensions
Summary
In the present era of big data, most of the datasets are high dimensional ranging from few hundreds to thousands of features. The objective of a feature selection algorithm is to find the most significant features while maintaining the underlying structure of the dataset This helps in making better predictive models in terms of performance by achieving high accuracy and reduced time complexity. Optimization (PSO) [15]–[18], Bat Algorithms (BA) [19], [20], Ant Colony Optimization (ACO) [21], [22] and Multi-Objective Evolutionary Algorithms [23], [24] belong to this category The use of these meta-heuristic approaches have opened up new horizons in feature selection, they are in infancy phases and need more thorough investigations and research. The significance of MCTS is investigated and a novel hybrid framework is proposed for feature selection in very high dimensional datasets, referred as H-MOTiFS.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have