Evolutionary computation for training set selection

Nicolás García‐Pedrajas

doi:10.1002/widm.44

Abstract

AbstractInstance selection is becoming increasingly relevant because of the large amount of data that is constantly being produced in many fields of research. Two basic approaches exist for instance selection: instance selection as a method for prototype selection for instance‐based methods (such as k‐nearest neighbors) and instance selection for obtaining the training set for classifiers that require a learning process (such as decision trees or neural networks). In this paper, we review the methods that have been developed thus far for the latter approach within the field of evolutionary computation. Different groups of learning algorithms require different instance selectors to suit their learning/search biases. This requirement may render many instance selection algorithms useless if their philosophy of design is not suitable for the problem at hand. Evolutionary algorithms do not assume any structure of the data or any behavior of the classifier but instead adapt the instance selection to the performance of the classifier. They are therefore very suitable for training set selection. The main algorithms that have been developed for decision trees, artificial neural networks, and other classifiers are presented. We also discuss the relevant issue of the scalability of these methods to very large datasets. Although current algorithms are useful for fairly large datasets, scaling problems are found when the number of instances is in the hundreds of thousands or millions. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 512–523 DOI: 10.1002/widm.44This article is categorized under: Technologies > Computational Intelligence Technologies > Computer Architectures for Data Mining Technologies > Data Preprocessing

Full Text