Abstract

Recently, in mobile and embedded devices, deep learning is changing the way computers see, hear, and understand the world. When deep learning is deployed to such systems, they are supposed to perform inference tasks in a timely and energy-efficient manner. Lots of research has focused on taming deep learning for resource-constrained devices by either compressing deep learning models or devising hardware accelerators. However, these approaches have focused on providing ‘best-effort’ performance for such devices. In this paper, we present the design and implementation of DeepRT, a novel deep learning inference runtime. Unlike previous approaches, DeepRT focuses on supporting predictable temporal and spatial inference performance when deep learning models are used under unpredictable and resource-constrained environments. In particular, DeepRT applies formal control theory to support Quality-of-Service (QoS) management that can dynamically minimize the tardiness of inference tasks at runtime while achieving high energy-efficiency. Further, DeepRT determines a proper level of compression of deep learning models at runtime according to the memory availability and users’ QoS requirements, resulting in proper trade-offs between the memory savings and the losses of inference accuracy. We evaluate DeepRT on a wide range of deep learning models under various conditions. The experimental results show that DeepRT supports the timeliness of inference tasks in a robust and energy-efficient manner.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.