Abstract
Algorithm selection is a challenging task in machine learning. Meta-learning treats algorithm selection as a supervised learning task, in which training examples (i.e., meta-examples) are generated from experiments performed with a set of candidate algorithms in several datasets. The small availability of real datasets in some domains can make it difficult to generate good sets of meta-examples. An alternative is the use of synthetic datasets. Unfortunately, not all synthetic datasets can be considered equally relevant and representative compared to real datasets. Thus simply adopting a high number of arbitrary synthetic datasets increases the computational cost of performing experiments, without necessarily improving the quality of meta-learning. In this paper, we treat the selection of relevant synthetic datasets for meta-learning as an One-Class Classification (OCC) problem. In this problem, it is assumed the availability of instances associated to a single class of interest (the positive class) and a large set of unlabelled instances (the unknown class). The objective is to classify which unlabelled instances most likely belong to the positive class. In our context, OCC techniques are used to select the most relevant synthetic datasets (unknown class), by considering the real datasets (positive class) available. In our work, we conducted experiments in a case study in which we adopted a data manipulation procedure to produce synthetic datasets and two OCC techniques for dataset selection. The results revealed that it was actually possible to select a reduced number of synthetic datasets while maintaining or even increasing meta-learning performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.