The optimal combination of feature selection and data discretization: An empirical study

Chih-Fong Tsai,Yu-Chi Chen

doi:10.1016/j.ins.2019.07.091

Abstract

Feature selection and data discretization are two important data pre-processing steps in data mining, with the focus in the former being on filtering out unrepresentative features and in the latter on transferring continuous attributes into discrete ones. In the literature, these two domain problems have often been studied, individually. However, the combination of these two steps has not been fully explored, although both feature selection and discretization may be required for some real-world datasets. In this paper, two different combination orders of feature selection and discretization are examined in terms of their classification accuracies and computational times. Specifically, filter, wrapper, and embedded feature selection methods are employed, which are PCA, GA, and C4.5, respectively. For discretization, both supervised and unsupervised learning based discretizers are used, specifically MDLP, ChiMerge, equal frequency binning, and equal width binning. The experimental results, based on 10 UCI datasets, show that, for the SVM classifier performing MDLP first and C4.5 second outperforms the other combinations. Not only is less computational time required but this also provides the highest rate of classification accuracy. For the decision tree classifier, performing C4.5 first and MDLP second is recommended.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The optimal combination of feature selection and data discretization: An empirical study

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Journal: Information Sciences	Publication Date: Jul 26, 2019
Citations: 56

Similar Papers

Genetic algorithms in feature and instance selection
Chih-Fong Tsai ... Chi-Yuan Chu
Knowledge-Based Systems | VOL. 39
Chih-Fong Tsai, et. al.Chih-Fong Tsai ... Chi-Yuan Chu
27 Nov 2012
Knowledge-Based Systems | VOL. 39

An enhanced binary Rat Swarm Optimizer based on local-best concepts of PSO and collaborative crossover operators for feature selection
Mohammed A Awadallah ... Raed Abu Zitar
Computers in Biology and Medicine | VOL. 147
Mohammed A Awadallah, et. al.Mohammed A Awadallah ... Raed Abu Zitar
02 Jun 2022
Computers in Biology and Medicine | VOL. 147

Evolutionary feature and instance selection for traffic sign recognition
Zong-Yao Chen ... Chih-Fong Tsai
Computers in Industry | VOL. 74
Zong-Yao Chen, et. al.Zong-Yao Chen ... Chih-Fong Tsai
11 Sep 2015
Computers in Industry | VOL. 74

Identification of important features and data mining classification techniques in predicting employee absenteeism at work
Amal Al-Rasheed
International Journal of Electrical and Computer Engineering (IJECE) | VOL. 11
Amal Al-RasheedAmal Al-Rasheed
01 Oct 2021
International Journal of Electrical and Computer Engineering (IJECE) | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The optimal combination of feature selection and data discretization: An empirical study

Abstract

Talk to us

Similar Papers

More From: Information Sciences