A Novel Penalty-Based Wrapper Objective Function for Feature Selection in Big Data Using Cooperative Co-Evolution

A N M Bazlur Rashid,Mohiuddin Ahmed,Leslie F Sikos,Paul Haskell-Dowland

doi:10.1109/access.2020.3016679

A N M Bazlur Rashid, Mohiuddin Ahmed + Show 2 more

Open Access

https://doi.org/10.1109/access.2020.3016679

Copy DOI

Abstract

The rapid progress of modern technologies generates a massive amount of high-throughput data, called Big Data, which provides opportunities to find new insights using machine learning (ML) algorithms. Big Data consist of many features (also called attributes); however, not all these are necessary or relevant, and they may degrade the performance of ML algorithms. Feature selection (FS) is an essential preprocessing step to reduce the dimensionality of a dataset. Evolutionary algorithms (EAs) are widely used search algorithms for FS. Using classification accuracy as the objective function for FS, EAs, such as the cooperative co-evolutionary algorithm (CCEA), achieve higher accuracy, even with a higher number of features. Feature selection has two purposes: reducing the number of features to decrease computations and improving classification accuracy, which are contradictory but can be achieved using a single objective function. For this very purpose, this paper proposes a penalty-based wrapper objective function. This function can be used to evaluate the FS process using CCEA, hence called Cooperative Co-Evolutionary Algorithm-Based Feature Selection (CCEAFS). An experiment was performed using six widely used classifiers on six different datasets from the UCI ML repository with FS and without FS. The experimental results indicate that the proposed objective function is efficient at reducing the number of features in the final feature subset without significantly reducing classification accuracy. Based on different performance measures, in most cases, naive Bayes outperforms other classifiers when using CCEAFS.

Highlights

A massive volume of data is continuously generated by modern technologies in a variety of sectors including: healthcare, finance, and economics
As the objectives of Feature selection (FS) are twofold, an appropriate single objective function is required that satisfies the FS objectives [2]. This objective function can be used as the fitness function for the cooperative co-evolutionary algorithm (CCEA)-based FS (CCEAFS) to converge the algorithm in an attempt to reduce the number of features without significantly decreasing the classification performance of the machine learning (ML) classifiers
The results indicate that the highest accuracy can be achieved by random forest (RF) using all features in the dataset (97.34%), while naïve Bayes (NB) achieved the lowest accuracy (84.04%)

Summary

Introduction

A massive volume of data is continuously generated by modern technologies in a variety of sectors including: healthcare, finance, and economics. This objective function can be used as the fitness function for the CCEA-based FS (CCEAFS) to converge the algorithm in an attempt to reduce the number of features without significantly decreasing the classification performance of the ML classifiers.

Results

Conclusion