A Decision Support Framework for AutoML Systems: A Meta-Learning Approach

Salijona Dyrmishi,Sherif Sakr,Radwa Elshawi

doi:10.1109/icdmw.2019.00025

Abstract

In general, the process of building a high-quality machine learning model is an iterative, complex and timeconsuming process that involves exploring the performance of various machine learning algorithms in addition to having a good understanding and experience with effectively tuning their hyper-parameters. In practice, conducting this process efficiently requires solid knowledge and experience with the various techniques that can be employed. With the continuous and vast increase of the amount of data in our digital world, it has been acknowledged that the number of knowledgeable data scientists can not scale to address these challenges. Thus, there is a crucial need for automating the process of Combined Algorithm Selection and Hyper-parameter tuning (CASH) in the machine learning domain. Recently, several systems (e.g., AutoWeka, AutoSKLearn, SmartML) have been introduced to tackle this challenge with the main aim of reducing the role of human in the loop and filling the gap for non-expert machine learning users by playing the role of the data scientist. Meta-Learning is described as the process of learning from previous experience gained during applying various learning algorithms on different types of data, and hence reducing the needed time to learn new tasks. In the context of the Automated Machine Learning (AutoML) process, one main advantage of meta-learning techniques is that they allow hand-engineered algorithms to be replaced with novel automated methods which are designed in a data-driven way. In this paper, we present a methodology and framework for using Meta-Learning techniques to develop new methods that serve as an effective decision support for the AutoML process. In particular, we use Meta-Learning techniques to answer several crucial questions for the AutoML process including: 1) Which classifiers are expected to be the best performing on a given dataset? 2) Can we predict the training time of a classifier? 3) Which classifiers are worth investing a larger portion of the time budget to improve their performance by tuning them? In our Meta-Learning process, we used 200 datasets with different characteristics on a wide set of metafeatures. In addition, we used 30 classifiers from two popular machine learning libraries, namely, Weka and Scikit-learn. Our results and Meta-Models have been obtained in a fully automated way. The methodology and results of our framework can be easily embeded/utilized by any AutoML system.

Full Text