Abstract

Feature selection is a widespread preprocessing step in the data mining field. One of its purposes is to reduce the number of original dataset features to improve a predictive model’s performance. Despite the benefits of feature selection for the classification task, to the best of our knowledge, few studies in the literature address feature selection for the hierarchical classification context. This paper proposes a novel feature selection method based on the general variable neighborhood search metaheuristic, combining a filter and a wrapper step, wherein a global model hierarchical classifier evaluates feature subsets. We used twelve datasets from the proteins and images domains to perform computational experiments to validate the effect of the proposed algorithm on classification performance when using two global hierarchical classifiers proposed in the literature. Statistical tests showed that using our method for feature selection led to predictive performances that were consistently better than or equivalent to that obtained by using all features with the benefit of reducing the number of features needed, which justifies its efficiency for the hierarchical classification scenario.

Highlights

  • Data mining applications have become essential in recent years due to the massive increase in data generation and storage

  • We propose an algorithm that uses a variation of the Variable Neighborhood Search (VNS) [20] metaheuristic, called General Variable Neighborhood Search (GVNS) [21], that applies the Basic Variable Neighborhood Descent (B-VND) [22] procedure as a local search method

  • WORK In this paper, we presented a novel feature selection method tailored for global model hierarchical classifiers

Read more

Summary

INTRODUCTION

Data mining applications have become essential in recent years due to the massive increase in data generation and storage. Few recent approaches that use a set of flat classifiers have proposed techniques based on recursive regularization that take into account the hierarchical information of classes (e.g., parent-child, sibling, and graph relations) [6], [7]. Other ranked-based methods have proposed to readjust some existing popular filter feature selection algorithms to take into account the hierarchical structure of classes [9], [10]. Unlike the previously mentioned studies, we propose a feature selection approach designed for global model hierarchical classifiers, dealing directly with the class hierarchy relations. We propose another method that explores and takes advantage of using jointly a filter-based approach adapted to consider the hierarchical structure of classes and a search-based metaheuristic technique to find the best subset of features.

BACKGROUND
FEATURE SELECTION IN CLASSIFICATION
PROBLEM STATEMENT
SOLUTION REPRESENTATION AND EVALUATION
BUILDING AN INITIAL SOLUTION
GVNS APPROACH TO SOLVE FSHC
EXPERIMENTAL RESULTS
DATASET DESCRIPTION
PARAMETER SETTINGS
COMPUTATIONAL RESULTS
CONCLUSIONS AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call