Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection

Xueqiang Zeng,Gang Luo

doi:10.1007/s13755-017-0023-z

Abstract

PurposeMachine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era.MethodsTo address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values.ResultsWe report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization.ConclusionsThis is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.

Highlights

Machine learning is a key technology for modern clinical data analysis and can be used to support many clinical applications
We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization
(2) We present several new optimizations tailored to automatic machine learning model selection

Summary

Introduction

Machine learning is a key technology for modern clinical data analysis and can be used to support many clinical applications. To make machine learning accessible, statistics and computer science researchers have built various open source software tools such as Weka [6], scikit-learn [7], PyBrain [8], RapidMiner, R, and KNIME [9] These software tools integrate many machine learning algorithms and provide intuitive graphical user interfaces. A detailed review of existing automatic selection methods for algorithms and/or hyper-parameter values is provided in our papers [11, 15]. The generalization performance is estimated by M(Aλ, D), the error rate attained by Aλ when trained and tested on D, e.g., via stratified multi-fold cross validation to decrease the possibility of overfitting [6] Using this estimate, the objective of machine learning model selection is to find A∗∗ ∈ arg minA∈A, ∈Λ M(A , D)

Objectives

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Health Information Science and Systems	Publication Date: Sep 27, 2017
Citations: 95	License type: open-access

R Discovery Prime

R Discovery Prime

Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Health Information Science and Systems

Lead the way for us

Similar Papers

MLBCD: a machine learning tool for big clinical data.
Gang Luo
Health Information Science and Systems | VOL. 3
Gang LuoGang Luo
28 Sep 2015
Health Information Science and Systems | VOL. 3

The rise of machine learning for big data analytics
Rayner Alfred
-
Rayner AlfredRayner Alfred
01 Oct 2016
01 Oct 2016

Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods.
Gang Luo ... Sean D Mooney
JMIR Research Protocols | VOL. 6
Gang Luo, et. al.Gang Luo ... Sean D Mooney
29 Aug 2017
JMIR Research Protocols | VOL. 6

Machine Learning in Big Data
Lidong Wang
International Journal of Advances in Applied Sciences | VOL. 4
Lidong WangLidong Wang
01 Dec 2015
International Journal of Advances in Applied Sciences | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Health Information Science and Systems