Abstract

Recent discretization-based feature selection methods show great advantages by introducing the entropy-based cut-points for features to integrate discretization and feature selection into one stage for high-dimensional data. However, current methods usually consider the individual features independently, ignoring the interaction between features with cut-points and those without cut-points, which results in information loss. In this paper, we propose a cooperative coevolutionary algorithm based on the genetic algorithm (GA) and particle swarm optimization (PSO), which searches for the feature subsets with and without entropy-based cut-points simultaneously. For the features with cut-points, a ranking mechanism is used to control the probability of mutation and crossover in GA. In addition, a binary-coded PSO is applied to update the indices of the selected features without cut-points. Experimental results on 10 real datasets verify the effectiveness of our algorithm in classification accuracy compared with several state-of-the-art competitors.

Highlights

  • Feature selection (FS) is an important task in machine learning, aiming to find an optimal subset of features to improve the performances of classification [1] or clustering [2,3]

  • In this paper, we propose a cooperative coevolutionary discretization-based FS algorithm (CC-DFS), which searches the subset of features with cut-points and without cut-points simultaneously

  • Ten real genetic data were used to test the performances of different algorithms, which can be downloaded from https://github.com/primekangkang/Genedata

Read more

Summary

Introduction

Feature selection (FS) is an important task in machine learning, aiming to find an optimal subset of features to improve the performances of classification [1] or clustering [2,3]. By removing those redundant and irrelevant features, the model complexity is reduced and the overfitting in the training process can be avoided. Current FS algorithms can be generally categorized into wrapper and filter methods [4]. In PSO, the population maintains a set of particles, each of which represents a feasible solution in the decision space. The velocity parameter is updated according to pbest and gbest in each iteration as follows:

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call