Abstract

Meta-learning, or learning to learn, is a machine learning approach that utilizes prior learning experiences to expedite the learning process on unseen tasks. As a data-driven approach, meta-learning requires meta-features that represent the primary learning tasks or datasets, and are estimated traditonally as engineered dataset statistics that require expert domain knowledge tailored for every meta-task. In this paper, first, we propose a meta-feature extractor called Dataset2Vec that combines the versatility of engineered dataset meta-features with the expressivity of meta-features learned by deep neural networks. Primary learning tasks or datasets are represented as hierarchical sets, i.e., as a set of sets, esp. as a set of predictor/target pairs, and then a DeepSet architecture is employed to regress meta-features on them. Second, we propose a novel auxiliary meta-learning task with abundant data called dataset similarity learning that aims to predict if two batches stem from the same dataset or different ones. In an experiment on a large-scale hyperparameter optimization task for 120 UCI datasets with varying schemas as a meta-learning task, we show that the meta-features of Dataset2Vec outperform the expert engineered meta-features and thus demonstrate the usefulness of learned meta-features for datasets with varying schemas for the first time.

Highlights

  • Meta-learning, or learning to learn, refers to any learning approach that systematically makes use of prior learning experiences to accelerate training on unseen tasks or datasets (Vanschoren 2018)

  • A way more simple, unsupervised plausibility argument for the usefulness of the extracted meta-features is depicted in Fig. 1 showing a 2D embedding of the meta-features of 2000 synthetic classification toy datasets of three different types computed by (a) two sets of engineered dataset meta-features: MF1 (Wistuba et al 2016) and MF2 (Feurer et al 2015); (b) a stateof-the-art model based on variational autoencoders, the Neural Statistician (Edwards and Storkey 2017b), and (c) the proposed meta-feature extractor Dataset2Vec

  • We show experimentally that using the meta-features extracted through Dataset2Vec for the hyperparameter optimization meta-task outperforms the use of engineered meta-features designed for this meta-task

Read more

Summary

Introduction

Meta-learning, or learning to learn, refers to any learning approach that systematically makes use of prior learning experiences to accelerate training on unseen tasks or datasets (Vanschoren 2018). We design a novel meta-feature extractor called Dataset2Vec, that learns metafeatures from (tabular) datasets of a varying number of instances, predictors, or targets. A way more simple, unsupervised plausibility argument for the usefulness of the extracted meta-features is depicted in Fig. 1 showing a 2D embedding of the meta-features of 2000 synthetic classification toy datasets of three different types (circles/moon/blobs) computed by (a) two sets of engineered dataset meta-features: MF1 (Wistuba et al 2016) and MF2 (Feurer et al 2015) (see Table 3); (b) a stateof-the-art model based on variational autoencoders, the Neural Statistician (Edwards and Storkey 2017b), and (c) the proposed meta-feature extractor Dataset2Vec. For the 2D embedding, multi-dimensional scaling has been applied (Borg and Groenen 2003) on these meta-features. We show experimentally that using the meta-features extracted through Dataset2Vec for the hyperparameter optimization meta-task outperforms the use of engineered meta-features designed for this meta-task

Related work
Problem setting: meta-feature learning
The meta-feature extractor Dataset2Vec
Preliminaries
Hierarchical set modeling of datasets
Network architecture
The auxiliary meta-task: dataset similarity learning
The auxiliary problem
The auxiliary meta model and training
Experiments
Dataset similarity learning for datasets of similar schema
Baselines
Evaluation metric
Toy meta dataset
Dataset similarity learning for datasets of different schema
UCI meta dataset
Hyperparameter optimization
Evaluation metrics
UCI surrogate dataset
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call