Abstract

The large amount of data that is produced today with new technologies is an impediment for machine learning algorithms to work correctly, both due to the memory requirements and the necessary execution times. That is why the processes of reducing both the quantity and the size of the data are increasingly important. One of these processes is the so-called instance selection. In this paper we propose three-objective constrained optimization models to formulate instance selection wrapper and filter methods (separately) for classification problems, which are solved with multi-objective evolutionary algorithms and multi-objective differential evolution. In the proposed instance selection wrapper method, an objective is added to the usual ones to minimize the generalization error of the classifier. The proposed instance selection filter method simultaneously optimizes the correlation, redundancy and consistency of the datasets. Instance retention constraints are imposed on optimization models to retain a maximum percentage of samples, established by the decision maker, in big data scenarios. The experiments have been designed to compare (1) the NSGA-II and MODE algorithms, (2) two- and three-objective optimization models, (3) two different constraint handling techniques, and (4) the proposed evolutionary approaches and other 12 non-evolutionary approaches used in literature. The proposed wrapper and filter instance selection methods have been used in a real-world business engineering application, and have also been validated using three public datasets to facilitate the replicability of the research results. The results of the experiments show the superiority of the three-objective constrained evolutionary techniques proposed in this paper over the non-evolutionary techniques and over the two-objective evolutionary approaches used in the literature.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call