Online Scheduling of CPU-NPU Co-inference for Edge AI Tasks

Xiancheng Lin,Zhilan Huang,Xu Chen,Jiajie Xie,Gang Lu,Qian Wei,Rongkai Liu,Zhi Zhou

doi:10.1109/wcnc55385.2023.10118755

Abstract

Edge AI is an emerging paradigm that leverages edge computing to pave the last mile delivery of artificial intelligence. To satisfy the stringent timeliness and energy-efficiency requirements of emerging edge AI tasks, specialized AI accelerator of Neural Processing Units (NPU) have been widely equipped by edge nodes. Compared to the traditional centralized processing units (CPU), NPU has better performance and energy-efficiency. However, these benefits come at the cost of reduced inference accuracy. As a result, existing coarse-grained scheduling mechanisms that schedule a whole DNN task to either the CPU or NPU are unable to make the best use of NPU. To address this issue, we propose an online NPU-CPU co-inference scheduling mechanism to schedule the DNN task at the fine-grained layer level, and thus to fully utilize the performance, accuracy, and power diversities of the NPU and CPU. By applying Lyapunov optimization to schedule the network layers dynamically, our proposed online scheduling mechanism is able to ensure the real-time inference speed and cap the long-term time-averaged power consumption, while still approximately minimizes the long-term inference accuracy loss. Via rigorous theoretical analysis as well as realistic trace-driven simulations, we demonstrate the effectiveness of our proposed online scheduling mechanism.

Full Text