Metalearning for choosing feature selection algorithms in data mining: Proposal of a new framework

Antonio Rafael Sabino Parmezan,Huei Diana Lee,Feng Chung Wu

doi:10.1016/j.eswa.2017.01.013

Abstract

In Data Mining, during the preprocessing step, there is a considerable diversity of candidate algorithms to select important features, according to some criteria. This broad availability of algorithms that perform the Feature Selection task gives rise to the difficulty of choosing, a priori, between the algorithms at hand, the most promising one for a particular problem. In this paper, we present the proposal and evaluation of a new architecture for the recommendation of Feature Selection algorithms based on the use of Metalearning. Our framework is very flexible since the user can adapt it to its proper needs. This flexibility is one of the main advantages of our proposal over other approaches in the literature, which involve steps that cannot be adapted to the user’s local requirements. Furthermore, it combines several concepts of intelligent systems, including Machine Learning and Data Mining, with topics derived from expert systems, as user and data-driven knowledge, with meta-knowledge. This set of solutions coupled with leading-edge technologies allows our architecture to be integrated into any information system, which impact on the automation of services and in reducing human effort during the process. Regarding the Metalearning process, our framework considers several types of properties inherent to the data sets, as well as, Feature Selection algorithms based on many information, distance, dependence and consistency measures. The quality of the methods for Feature Selection was estimated according to a multicriteria performance measure, which guided the ranking process of these algorithms for the construction of data metabases. Proposed by the authors of this work, this multicriteria performance measure combines any three measurements on a single one, creating an interesting and powerful tool to evaluate not only FS algorithms but also to assess any context where it is necessary a combination to maximize a measure or minimize it. The recommendation models, represented by decision trees and induced from the training metabases, allowed us to see in what circumstances a Feature Selection algorithm outperforms the other and what aspects of the data present greater influence in determining the performance of these algorithms. Nevertheless, if the user wishes, any other learning algorithm may be used to induce the recommendation model. This versatility is another strong point of this proposal. Results show that with the characterization of data, through statistical, information and complexity measures, it is possible to reach an accuracy higher than 90%. Besides yielding recommendation models that are interpretable and robust to overfitting, the developed architecture is less computationally expensive than approaches recently proposed in the literature.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Metalearning for choosing feature selection algorithms in data mining: Proposal of a new framework

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications

Lead the way for us

Journal: Expert Systems with Applications	Publication Date: Jan 16, 2017
Citations: 42

Similar Papers

A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring
Fatemeh Nemati Koutanaei ... Mohammad Khanbabaei
Journal of Retailing and Consumer Services | VOL. 27
Fatemeh Nemati Koutanaei, et. al.Fatemeh Nemati Koutanaei ... Mohammad Khanbabaei
16 Jul 2015
Journal of Retailing and Consumer Services | VOL. 27

An evaluation of feature selection and reduction algorithms for network IDS data
Therese Bjerkestrand ... Dimitris Tsaptsinos
-
Therese Bjerkestrand, et. al.Therese Bjerkestrand ... Dimitris Tsaptsinos
01 Jun 2015
01 Jun 2015

Automatic recommendation of feature selection algorithms based on dataset characteristics
Antonio Rafael Sabino Parmezan ... Feng Chung Wu
Expert Systems with Applications | VOL. 185
Antonio Rafael Sabino Parmezan, et. al.Antonio Rafael Sabino Parmezan ... Feng Chung Wu
13 Jul 2021
Expert Systems with Applications | VOL. 185

Application of Feature Selection Based on Elastic Network and Random Forest in the Evaluation of Sports Effects
Lina Ren ... Shen Cao
Journal of Electrical and Computer Engineering | VOL. 2022
Lina Ren, et. al.Lina Ren ... Shen Cao
24 May 2022
Journal of Electrical and Computer Engineering | VOL. 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Metalearning for choosing feature selection algorithms in data mining: Proposal of a new framework

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications