Efficient Identification of Approximate Best Configuration of Training in Large Datasets

Silu Huang,Surajit Chaudhuri,Chi Wang,Bolin Ding

doi:10.1609/aaai.v33i01.33013862

Efficient Identification of Approximate Best Configuration of Training in Large Datasets

Silu Huang, Surajit Chaudhuri + Show 2 more

Open Access

https://doi.org/10.1609/aaai.v33i01.33013862

Copy DOI

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jul 17, 2019
Citations: 8

Affiliation: University of Illinois Urbana-Champaign, Microsoft Research (United Kingdom), Alibaba Group (United States)

#Training Set #Orders Of Magnitude Speedup + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

A configuration of training refers to the combinations of feature engineering, learner, and its associated hyperparameters. Given a set of configurations and a large dataset randomly split into training and testing set, we study how to efficiently identify the best configuration with approximately the highest testing accuracy when trained from the training set. To guarantee small accuracy loss, we develop a solution using confidence interval (CI)-based progressive sampling and pruning strategy. Compared to using full data to find the exact best configuration, our solution achieves more than two orders of magnitude speedup, while the returned top configuration has identical or close test accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.