Abstract

To realize smart IoT services, such as intelligent video surveillance, smart city and autonomous driving, tremendous amount of distributed machine learning jobs will train unbiased models over large datasets collected by geo-distributed wireless edge network, adopting a parameter server (PS) architecture. The training of unbiased distributed learning (UDL) relies on geo-distributed data and brings high response latency and bandwidth consumption, introducing a new challenge: how to schedule UDL jobs such that the response latency (training time) is minimized, meanwhile reducing the expensive bandwidth cost among geo-distributed sites in the edge wireless network. To address it, we propose two online scheduling algorithms, Okita and Okita∗, to achieve long-term overall cost minimization. Okita schedules UDL jobs at each time slot in a preemptive manner to jointly decide the execution time window, the amount of training data, the number and the location of concurrent workers and PSs in each site, whereas Okita∗ schedules jobs via a non-preemptive fashion. To evaluate the practical effectiveness of the proposed algorithms, we implement both testbed experiments and large-scaled simulations. We show that our proposed algorithms can reduce up to 70% of the total training cost, compared to three classical schedulers in today’s cloud system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call