Combining three strategies for evolutionary instance selection for instance-based learning

Aida De Haro-García,Nicolás García-Pedrajas,Javier Pérez-Rodríguez

doi:10.1016/j.swevo.2018.02.022

Abstract

Abstract Instance-based learning methods, such as the k -nearest neighbor rule, are among the top-performing methods in any classification task. Despite their simplicity, they achieve comparable performance to much more complex methods. However, one of their problems is the necessity to store all of the training instances in memory. For large datasets, this might also affect the speed of the testing process. As a solution to this problem, instance selection methods, which remove redundant and noisy instances, have been proposed. However, a general side effect of these methods is that they produce a significant reduction in the accuracy of the instance-based learner. The classification performance is significantly worse compared to the application of the k -nearest neighbor rule using all instances. In this paper, we propose an evolutionary instance selection algorithm that combines three strategies to avoid this negative side effect. First, it uses the framework of a CHC genetic algorithm, as it has been proved to be the best-performing method for this task. Second, it incorporates the possibility of selecting each instance more than once. This has also been proven useful in previous works. Finally, it uses a local value for k that depends on the nearest neighbor of every test instance. These three combined strategies are able to achieve better reduction than previous approaches while maintaining the same classification performance as the k -nearest neighbor rule. In a large set of 150 real-world problems, our approach proves to be the best performing among state-of-the-art instance selection algorithms and matches the classification performance of the k -nearest neighbor rule using the whole training set.

Full Text