A Population Initialization Method Based on Similarity and Mutual Information in Evolutionary Algorithm for Bi-objective Feature Selection

Xu Cai,Yu Xue

doi:10.1145/3653025

Abstract

Feature selection (FS) is an important data pre-processing technique in classification. It aims to remove redundant and irrelevant features from the data, which reduces the dimensionality of data and improves the performance of the classifier. Thus, FS is a bi-objective optimization problem, and evolutionary algorithms (EAs) have been proven to be effective in solving bi-objective FS problems. EA is a population-based metaheuristic algorithm, and the quality of the initial population is an important factor affecting the performance of EA. An improper initial population may negatively affect the convergence speed of the EA and even make the algorithm fall into the local optimum. In this paper, we propose a similarity and mutual information-based initialization method, named SMII, to improve the quality of the initial population. This method determines the distribution of initial solutions based on similarity, and shields features with high correlation to the selected features according to mutual information. In the experiment, we embed SMII, the latest four initialization methods, and a traditional random initialization method into NSGA-II, and compared their performance on 15 public datasets. The experimental results show that SMII performs best on most datasets, and can effectively improve the performance of the algorithm. Moreover, we compare the performance of two other EA before and after embedding SMII on 15 datasets, the results further prove that the proposed method can effectively improve the search capability of the EA for FS.

Full Text