Abstract

In the era of big data, machine learning-based data analysis has been integrated into almost all walks of modern life. Before applying machine learning, a machine learning algorithm with its proper hyper-parameters have to be decided, where rich machine learning knowledge and lots of practical manual iterations are required. In order to popularize machine learning and allow non-professionals to use machine learning to solve problems, automatic machine learning model selection is particularly important. Among various existing automatic machine learning model selection methods, Progressive Sampling-based Bayesian Optimization (PSBO) is one of the most efficient and effective ones. However, PSBO adopted the progressive sampling with the traditional random sampling strategy, which does not consider the importance of individual samples. Based on the idea that more important and effective samples will make the model training results better, the paper proposed a Sample Importance Guided Progressive Sampling-based Bayesian Optimization (SIG-PSBO) for automatic machine learning. SIG-PSBO defines the sample importance by the difficulty to distinguish categories in a PCA feature space. Then samples with higher sample importance are more likely to be sampled for the subsequent model training. Extensive experiment results showed that the SIG-PSBO method can significantly shorten the search time and reduce the classification error rates compared to the original PSBO method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call