Abstract

In many scenarios (e.g., hurricanes, earthquake, rural areas), edge devices cannot access the cloud, which makes the cloud deep learning (DL) training approach inapplicable. However, an edge device may not be able to train a large-scale DL model due to its resource constraints. Though there are mobile-friendly DL models (e.g., mobilnet, shufflenet), it cannot meet the needs for different Deep Neural Networks (DNNs) and also model compression sacrifices accuracy. Distributed DL training among multiple edge devices is a solution. However, it poses challenges about how to partition a DNN model and assign the partitions among edge devices considering the DNN features and the resource availability, and how to handle edge overload to reduce the overall job time and accuracy loss. To handle the challenges, we propose both heuristic and Reinforcement Learning (RL) based DL job schedulers by leveraging DL job features. Our container-based emulation and real device experiments show that our job schedulers achieve up to 82% improvement on training time and 70% on consumed energy over comparison methods. We also open sourced our source code.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.