Progressive Sampling-Based Joint Automatic Model Selection of Machine Learning and Feature Selection

Sufen Chen ,Xiaoyan Zeng

doi:10.23977/jaip.2020.040104

Abstract

In most machine learning applications, selecting an appropriate machine learning model requires advanced knowledge and many labor-intensive manual iterations. As a result, automatic machine learning is particularly important in order to lower the threshold for machine learning. In addition, feature selection is a very important data preprocessing process. Selecting important features can alleviate the dimension disaster problem, and removing irrelevant features can reduce the difficulty of learning tasks. The existing automatic selection methods cannot perform the automatic selection of machine learning model and feature selection model simultaneously on large-scale data. Therefore, in order to adapt to the rapid development of the era of big data, this paper proposes to establish a unified hyperparameter space for machine learning and feature selection, and adopt Bayesian optimization model based on progressive sampling for automatic model selection. By extensive experiments, we show that our approach can significantly reduce search time and classification error rates compared to the most advanced automated model selection methods.

Full Text