Enabling Real-time AI Inference on Mobile Devices via GPU-CPU Collaborative Execution

Hao Li,Tarek Abdelzaher,Joseph K Ng

doi:10.1109/rtcsa55878.2022.00027

Abstract

AI-powered mobile applications are becoming increasingly popular due to recent advances in machine intelligence. They include, but are not limited to mobile sensing, virtual assistants, and augmented reality. Mobile AI models, especially Deep Neural Networks (DNN), are usually executed locally, as sensory data are collected and generated by end devices. This imposes a heavy computational burden on the resource-constrained mobile phones. There are usually a set of DNN jobs with deadline constraints waiting for execution. Existing AI inference frameworks process incoming DNN jobs in sequential order, which does not optimally support mobile users’ real-time interactions with AI services. In this paper, we propose a framework to achieve real-time inference by exploring the heterogeneous mobile SoCs, which contain a CPU and a GPU. Considering characteristics of DNN models, we optimally partition the execution between the mobile GPU and CPU. We present a dynamic programming-based approach to solve the formulated real-time DNN partitioning and scheduling problem. The proposed framework has several desirable properties: 1) computational resources on mobile devices are better utilized; 2) it optimizes inference performance in terms of deadline miss rate; 3) no sacrifices in inference accuracy are made. Evaluation results on an off-the-shelf mobile phone show that our proposed framework can provide better real-time support for AI inference tasks on mobile platforms, compared to several baselines.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enabling Real-time AI Inference on Mobile Devices via GPU-CPU Collaborative Execution

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Neurosurgeon
Yiping Kang ... Cao Gao
ACM SIGPLAN Notices | VOL. 52
Yiping Kang, et. al.Yiping Kang ... Cao Gao
04 Apr 2017
ACM SIGPLAN Notices | VOL. 52

Neurosurgeon
Yiping Kang ... Austin Rovinski
-
Yiping Kang, et. al.Yiping Kang ... Austin Rovinski
04 Apr 2017
04 Apr 2017

Neurosurgeon
Yiping Kang ... Austin Rovinski
ACM SIGARCH Computer Architecture News | VOL. 45
Yiping Kang, et. al.Yiping Kang ... Austin Rovinski
04 Apr 2017
ACM SIGARCH Computer Architecture News | VOL. 45

Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework
Geng Yuan ... Peiyan Dong
ACM Transactions on Embedded Computing Systems | VOL. 21
Geng Yuan, et. al.Geng Yuan ... Peiyan Dong
30 Sep 2022
ACM Transactions on Embedded Computing Systems | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enabling Real-time AI Inference on Mobile Devices via GPU-CPU Collaborative Execution

Abstract

Talk to us

Similar Papers