Abstract
Algorithm selection is a challenging task in machine learning. Meta-learning treats algorithm selection as a supervised learning task, in which training examples (i.e., meta-examples) are generated from experiments performed with a set of candidate algorithms in several datasets. The small availability of real datasets in some domains can make it difficult to generate good sets of meta-examples. An alternative is the use of synthetic datasets. Unfortunately, not all synthetic datasets can be considered equally relevant and representative compared to real datasets. Thus simply adopting a high number of arbitrary synthetic datasets increases the computational cost of performing experiments, without necessarily improving the quality of meta-learning. In this paper, we treat the selection of relevant synthetic datasets for meta-learning as an One-Class Classification (OCC) problem. In this problem, it is assumed the availability of instances associated to a single class of interest (the positive class) and a large set of unlabelled instances (the unknown class). The objective is to classify which unlabelled instances most likely belong to the positive class. In our context, OCC techniques are used to select the most relevant synthetic datasets (unknown class), by considering the real datasets (positive class) available. In our work, we conducted experiments in a case study in which we adopted a data manipulation procedure to produce synthetic datasets and two OCC techniques for dataset selection. The results revealed that it was actually possible to select a reduced number of synthetic datasets while maintaining or even increasing meta-learning performance.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have