Unsupervised feature selection based on bio-inspired approaches

Nádia Junqueira Martarelli,Marcelo Seido Nagano

doi:10.1016/j.swevo.2019.100618

Abstract

In recent years, the scientific community has witnessed an explosion in the use of pattern recognition algorithms. However, little attention has been paid to the tasks preceding the execution of these algorithms, the preprocessing activities. One of these tasks is dimensionality reduction, in which a subset of features that improves the performance of the mining algorithm is located and algorithm's runtime is reduced. Although there are many methods that address the problems in pattern recognition algorithms, effective solutions still need to be researched and explored. Hence, this paper aims to address three of the issues surrounding these algorithms. First, we propose adapting a promising meta-heuristic called biased random-key genetic algorithm, which considers a random initial population construction. We call this algorithm as unsupervised feature selection by biased random-key genetic algorithm I. Next, we propose an approach for building the initial population partly in a deterministic way. Thus, we applied this idea in two algorithms, named unsupervised feature selection by particle swarm optimization and unsupervised feature selection by biased random-key genetic algorithm II. Finally, we simulated different datasets to study the effects of relevant and irrelevant attributes, and of noisy and missing data on the performance of the algorithms. After the Wilcoxon rank-sum test, we can state that the proposed algorithms outperform all other methods in different datasets. It was also observed that the construction of the initial population in a partially deterministic way contributed to the better performance. It should be noted that some methods are more sensitive to noisy and missing data than others, as well as to relevant and irrelevant attributes. • Particle swarm optimization and biased random-key genetic algorithms are proposed. • It is investigated how the construction of the initial population, as well as the missing and noisy data, implies the performance of the algorithms. • Ten simulated data sets are used containing relevant and irrelevant attributes, as well as different percentages of missing and noisy data. • An exhaustive computational and statistical evaluation is carried out. • The proposed approaches outperformed other methods.

Full Text