Automated Backend Allocation for Multi-Model, On-Device AI Inference

Venkatraman Iyer,Sungho Lee,Hyunjun Kim,Juitem Joonwoo Kim,Youngjae Shin,Semun Lee

doi:10.1145/3673660.3655046

Abstract

On-Device Artificial Intelligence (AI) services such as face recognition, object tracking and voice recognition are rapidly scaling up deployments on embedded, memory-constrained hardware devices. These services typically delegate AI inference models for execution on CPU and GPU computing backends. While GPU delegation is a common practice to achieve high speed computation, the approach suffers from degraded throughput and completion times under multi-model scenarios, i.e., concurrently executing services. This paper introduces a solution to sustain performance in multi-model, on-device AI contexts by dynamically allocating a combination of CPU and GPU backends per model. The allocation is feedback-driven, and guided by a knowledge of model-specific, multi-objective pareto fronts comprising inference latency and memory consumption. Our backend allocation algorithm that runs online per model, and achieves 25-100% improvement in throughput over static allocations as well as load-balancing scheduler solutions targeting multi-model scenarios.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automated Backend Allocation for Multi-Model, On-Device AI Inference

Abstract

Talk to us

Similar Papers

More From: ACM SIGMETRICS Performance Evaluation Review

Lead the way for us

Similar Papers

Automated Backend Allocation for Multi-Model, On-Device AI Inference
Venkatraman Iyer ... Semun Lee
Proceedings of the ACM on Measurement and Analysis of Computing Systems | VOL. 7
Venkatraman Iyer, et. al.Venkatraman Iyer ... Semun Lee
07 Dec 2023
Proceedings of the ACM on Measurement and Analysis of Computing Systems | VOL. 7

Investigating potential tourists' attitudes toward artificial intelligence services: a market segmentation approach
Ja Young (Jacey) Choe ... Raymond Adongo
Journal of Hospitality and Tourism Insights | VOL. 7
Ja Young (Jacey) Choe, et. al.Ja Young (Jacey) Choe ... Raymond Adongo
10 Oct 2023
Journal of Hospitality and Tourism Insights | VOL. 7

Exploring Consumer-Robot interaction in the hospitality sector: Unpacking the reasons for adoption (or resistance) to artificial intelligence
Hafiz Muhammad Wasif Rasheed ... Hafiz Syed Mohsin Abbas
Technological Forecasting and Social Change | VOL. 192
Hafiz Muhammad Wasif Rasheed, et. al.Hafiz Muhammad Wasif Rasheed ... Hafiz Syed Mohsin Abbas
05 Apr 2023
Technological Forecasting and Social Change | VOL. 192

Artificial intelligence service recovery: The role of empathic response in hospitality customers’ continuous usage intention
Xingyang Lv ... Hong Xu
Computers in Human Behavior | VOL. 126
Xingyang Lv, et. al.Xingyang Lv ... Hong Xu
21 Aug 2021
Computers in Human Behavior | VOL. 126

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automated Backend Allocation for Multi-Model, On-Device AI Inference

Abstract

Talk to us

Similar Papers

More From: ACM SIGMETRICS Performance Evaluation Review