Distributed Deep Learning Using Volunteer Computing-Like Paradigm

Medha Atre,Ashwini Rao,Birendra Jha

doi:10.1109/ipdpsw52791.2021.00144

Abstract

Use of Deep Learning (DL) in commercial applications such as image classification, sentiment analysis and speech recognition is increasing. When training DL models with large number of parameters and/or large datasets, cost and speed of training can become prohibitive. Distributed DL training solutions that split a training job into subtasks and execute them over multiple nodes can decrease training time. However, the cost of current solutions, built predominantly for cluster computing systems, can still be an issue. In contrast to cluster computing systems, Volunteer Computing (VC) systems can lower the cost of computing, but applications running on VC systems have to handle fault tolerance, variable network latency and heterogeneity of compute nodes, and the current solutions are not designed to do so. We design a distributed solution that can run DL training on a VC system by using a data parallel approach. We implement a novel asynchronous SGD scheme called VC-ASGD suited for VC systems. In contrast to traditional VC systems that lower cost by using untrustworthy volunteer devices, we lower cost by leveraging preemptible computing instances on commercial cloud platforms. By using preemptible instances that require applications to be fault tolerant, we lower cost by 70-90% and improve data security.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Distributed Deep Learning Using Volunteer Computing-Like Paradigm

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Collusion-Resistant Sabotage-Tolerance Mechanisms for Volunteer Computing Systems
Kan Watanabe ... Susumu Horiguchi
-
Kan Watanabe, et. al.Kan Watanabe ... Susumu Horiguchi
01 Jan 2009
01 Jan 2009

An implementation of credibility-based job scheduling method in volunteer computing systems
Shun-Ichiro Tani ... Masaru Fukushi
-
Shun-Ichiro Tani, et. al.Shun-Ichiro Tani ... Masaru Fukushi
01 Jun 2015
01 Jun 2015

Implementation and evaluation of credibility-based voting for volunteer computing systems
Toshiya Doi ... Masaru Fukushi
-
Toshiya Doi, et. al.Toshiya Doi ... Masaru Fukushi
01 Sep 2014
01 Sep 2014

Expected-Credibility-Based Job Scheduling for Reliable Volunteer Computing
Kan Watanabe ... Susumu Horiguchi
IEICE Transactions on Information and Systems | VOL. E93-D
Kan Watanabe, et. al.Kan Watanabe ... Susumu Horiguchi
01 Jan 2009
IEICE Transactions on Information and Systems | VOL. E93-D

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distributed Deep Learning Using Volunteer Computing-Like Paradigm

Abstract

Talk to us

Similar Papers