Cerebro

Supun Nakandala,Arun Kumar,Yuhao Zhang

doi:10.14778/3407790.3407816

Abstract

Deep neural networks (deep nets) are revolutionizing many machine learning (ML) applications. But there is a major bottleneck to wider adoption: the pain and resource intensiveness of model selection. This empirical process involves exploring deep net architectures and hyper-parameters, often requiring hundreds of trials. Alas, most ML systems focus on training one model at a time, reducing throughput and raising overall resource costs; some also sacrifice reproducibility. We present Cerebro, a new data system to raise deep net model selection throughput at scale without raising resource costs and without sacrificing reproducibility or accuracy. Cerebro uses a new parallel SGD execution strategy we call model hopper parallelism that hybridizes task- and data-parallelism to mitigate the cons of these prior paradigms and offer the best of both worlds. Experiments on large ML benchmark datasets show that Cerebro offers 3x to 10x runtime savings relative to data-parallel systems like Horovod and Parameter Server and up to 8x memory/storage savings or up to 100x network savings relative to task-parallel systems. Cerebro also supports heterogeneous resources and fault tolerance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Cerebro

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: Aug 1, 2020
Citations: 54

Similar Papers

Application of Deep Learning and Machine Learning in Pattern Recognition
E Fantin Irudaya Raj ... M Balaji
-
E Fantin Irudaya Raj, et. al.E Fantin Irudaya Raj ... M Balaji
01 Jan 2021
01 Jan 2021

Quantifying uncertainty in unsupervised machine learning methods for seismic facies using outcrop-derived 3D models and synthetic seismic data
Karelia La Marca ... Lisa Stright
-
Karelia La Marca, et. al.Karelia La Marca ... Lisa Stright
15 Aug 2022
15 Aug 2022

HotML: A DSM-based machine learning system for social networks
Yangyang Zhang ... Richong Zhang
Journal of Computational Science | VOL. 26
Yangyang Zhang, et. al.Yangyang Zhang ... Richong Zhang
20 Sep 2017
Journal of Computational Science | VOL. 26

LCIP: a retargetable framework for optimized CNN inference
Lei Pan ... Shuvra S Bhattacharyya
-
Lei Pan, et. al.Lei Pan ... Shuvra S Bhattacharyya
13 Jun 2023
13 Jun 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cerebro

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment