Abstract

Efforts to leverage the benefits of Deep Learning(DL) models for performing inference in resource constrained embedded devices is very popular nowadays. Researchers worldwide are trying to come up with software and hardware accelerators that make pre-trained DL models suitable for running the inference phase in these devices. Apart from software and hardware accelerators, DL model partitioning and offloading to Cloud or Edge network servers is becoming more and more practicable with increasing importance of Edge Computing. DL inference workflow partitioning and offloading can augment software / hardware acceleration for improved latency and energy efficiency in resource constrained embedded systems. Efficacy of a computation offloading to system is dependent on proper profiling of timing and energy required for processing DL algorithms. In this work we implement a DL inference offloading system using Raspberry Pi 3 based robot vehicle with a hardware accelerator from Intel (Neural Compute Stick). We report the workload partitioning approach, detailed experimental results and performance improvements achieved. We demonstrate that the current approach of DL execution profiling without considering dynamic system load of the edge device results in sub-optimal partitioning of DL algorithm and provide a solution approach to that.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.