Abstract

In many scenarios (e.g., hurricanes, earthquake, rural areas), edge devices cannot access the cloud, which makes the cloud deep learning (DL) training approach inapplicable. However, an edge device may not be able to train a large-scale DL model due to its resource constraints. Though there are mobile-friendly DL models (e.g., mobilnet, shufflenet), it cannot meet the needs for different Deep Neural Networks (DNNs) and also model compression sacrifices accuracy. Distributed DL training among multiple edge devices is a solution. However, it poses challenges about how to partition a DNN model and assign the partitions among edge devices considering the DNN features and the resource availability, and how to handle edge overload to reduce the overall job time and accuracy loss. To handle the challenges, we propose both heuristic and Reinforcement Learning (RL) based DL job schedulers by leveraging DL job features. Our container-based emulation and real device experiments show that our job schedulers achieve up to 82% improvement on training time and 70% on consumed energy over comparison methods. We also open sourced our source code.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call