Abstract

Redundant and irrelevant features in datasets decrease classification accuracy, and increase computational time of classification algorithms, overfitting problem and complexity of the underlying classification model. Feature selection is a preprocessing technique used in classification algorithms to improve the selection of relevant features. Several approaches that combine Rough Set Theory (RST) with Nature Inspired Algorithms (NIAs) have been used successfully for feature selection. However, due to the inherit limitations of RST for some data types and the inefficient convergence of NIAs for high dimensional datasets, these approaches have mainly focused on a specific type of low dimensional nominal dataset. This paper proposes a new filter feature selection approach based on Binary Cuckoo Search (BCS) and RST, which is more efficient for low and high dimensional nominal, mixed and numerical datasets. It enhances BCS by developing a new initialization and global update mechanisms to increase the efficiency of convergence for high dimensional datasets. It also develops a more efficient objective function for numerical, mixed and nominal datasets. The proposed approach was validated on 16 benchmark datasets; 4 nominal, 4 mixed and 8 numerical drawn from the UCI repository. It was also evaluated against standard BCS; five NIAs with fuzzy RST approaches; two popular traditional FS approaches; and multi objective evolutionary, Genetic, and Particle Swarm Optimization (PSO) algorithms. Decision tree and naive Bayes algorithms were used to measure the classification performance of the proposed approach. The results show that the proposed approach achieved improved classification accuracy while minimizing the number of features compared to other state-of-the-art methods. The code is available at https://github.com/abualia4/EBCS .

Highlights

  • The increasing availability and use of data acquisition technologies is leading to exponential collection of data [1]

  • Differences in accuracy is considered significant when it is more than 5% [100], and accuracy of different methods is considered equal when the difference is less than 1% [85]

  • No significant difference was noted between average of precision and average of recall, accuracy was sufficient to evaluate the classification performance for the datasets used in the experiments

Read more

Summary

INTRODUCTION

The increasing availability and use of data acquisition technologies is leading to exponential collection of data [1]. RSTDD have shown several advantages over other methods including greater efficiency, faster computation, and does not need any preliminary or additional information about the data [36] These approaches usually work well for nominal datasets, they suffer from several inefficiency limitations when applied on mixed and numeral as well as high dimensional datasets. The paper proposes a new more efficient filter feature selection for classification approach, named Enhanced Binary Cuckoo Search (EBCS), for three types of datasets: nominal, mixed and numerical. The development of a new EBCS objective function to produce a reduced feature subset that achieves maximum classification accuracy and minimum number of features for nominal, mixed and numerical datasets by balancing between RST, number of features and frequent values.

RELATED WORK
23: Local Search: 24
EVALUATION METHODOLOGY
16 ISOLET-test
RESULTS AND DISCUSSIONS
ANALYSIS OF COMPUTATIONAL TIME AND NUMBER OF ITERATIONS
COMPARISON BETWEEN EBCS AND HYBRID NIA WITH FUZZY RST APPROACHES
Objective
CONCLUSION AND FUTURE WORK

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.