Abstract
Feature selection is a widespread preprocessing step in the data mining field. One of its purposes is to reduce the number of original dataset features to improve a predictive model’s performance. Despite the benefits of feature selection for the classification task, to the best of our knowledge, few studies in the literature address feature selection for the hierarchical classification context. This paper proposes a novel feature selection method based on the general variable neighborhood search metaheuristic, combining a filter and a wrapper step, wherein a global model hierarchical classifier evaluates feature subsets. We used twelve datasets from the proteins and images domains to perform computational experiments to validate the effect of the proposed algorithm on classification performance when using two global hierarchical classifiers proposed in the literature. Statistical tests showed that using our method for feature selection led to predictive performances that were consistently better than or equivalent to that obtained by using all features with the benefit of reducing the number of features needed, which justifies its efficiency for the hierarchical classification scenario.
Highlights
Data mining applications have become essential in recent years due to the massive increase in data generation and storage
We propose an algorithm that uses a variation of the Variable Neighborhood Search (VNS) [20] metaheuristic, called General Variable Neighborhood Search (GVNS) [21], that applies the Basic Variable Neighborhood Descent (B-VND) [22] procedure as a local search method
WORK In this paper, we presented a novel feature selection method tailored for global model hierarchical classifiers
Summary
Data mining applications have become essential in recent years due to the massive increase in data generation and storage. Few recent approaches that use a set of flat classifiers have proposed techniques based on recursive regularization that take into account the hierarchical information of classes (e.g., parent-child, sibling, and graph relations) [6], [7]. Other ranked-based methods have proposed to readjust some existing popular filter feature selection algorithms to take into account the hierarchical structure of classes [9], [10]. Unlike the previously mentioned studies, we propose a feature selection approach designed for global model hierarchical classifiers, dealing directly with the class hierarchy relations. We propose another method that explores and takes advantage of using jointly a filter-based approach adapted to consider the hierarchical structure of classes and a search-based metaheuristic technique to find the best subset of features.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have