Cooperative co-evolution for feature selection in Big Data with random feature grouping

A N M Bazlur Rashid,Mohiuddin Ahmed,Paul Haskell-Dowland,Leslie F Sikos

doi:10.1186/s40537-020-00381-y

A N M Bazlur Rashid, Mohiuddin Ahmed + Show 2 more

Open Access

PDF Available

https://doi.org/10.1186/s40537-020-00381-y

Copy DOI

Export

Save

Cite

Journal: Journal of Big Data	Publication Date: Dec 1, 2020
Citations: 10	License type: open-access

Affiliation: Edith Cowan University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

A massive amount of data is generated with the evolution of modern technologies. This high-throughput data generation results in Big Data, which consist of many features (attributes). However, irrelevant features may degrade the classification performance of machine learning (ML) algorithms. Feature selection (FS) is a technique used to select a subset of relevant features that represent the dataset. Evolutionary algorithms (EAs) are widely used search strategies in this domain. A variant of EAs, called cooperative co-evolution (CC), which uses a divide-and-conquer approach, is a good choice for optimization problems. The existing solutions have poor performance because of some limitations, such as not considering feature interactions, dealing with only an even number of features, and decomposing the dataset statically. In this paper, a novel random feature grouping (RFG) has been introduced with its three variants to dynamically decompose Big Data datasets and to ensure the probability of grouping interacting features into the same subcomponent. RFG can be used in CC-based FS processes, hence called Cooperative Co-Evolutionary-Based Feature Selection with Random Feature Grouping (CCFSRFG). Experiment analysis was performed using six widely used ML classifiers on seven different datasets from the UCI ML repository and Princeton University Genomics repository with and without FS. The experimental results indicate that in most cases [i.e., with naïve Bayes (NB), support vector machine (SVM), k-Nearest Neighbor (k-NN), J48, and random forest (RF)] the proposed CCFSRFG-1 outperforms an existing solution (a CC-based FS, called CCEAFS) and CCFSRFG-2, and also when using all features in terms of accuracy, sensitivity, and specificity.

Highlights

The generation of massive volumes of data in the Big Data era is common in many areas, including, but not limited to, the Internet of Things (IoT), cybersecurity, and healthcare [1]
Conclusion and future work Following an extensive literature review on problem decomposition approaches using cooperative co-evolution (CC), this paper investigated the application of CC with a dynamic decomposition for feature selection
The experiments indicated that the feature selection process does not degrade the performance of the classifiers significantly

Summary

Introduction

The generation of massive volumes of data in the Big Data era is common in many areas, including, but not limited to, the Internet of Things (IoT), cybersecurity, and healthcare [1]. The use of ML classifiers in different application domains, for example, healthcare and cybersecurity, have been studied in the literature [2]. Feature selection is a process to select a subset of s features from a full set of n features ( s < n ) in a dataset by removing irrelevant and unimportant features, thereby representing it with less features [2]. A search technique initiates the FS process to discover feature subsets. Feature subsets are evaluated by different performance measures, for example, classification accuracy. A validation method at the end of the FS process can test the validity of the selected subset of features

Objectives

Results

Conclusion