Inferencing on Edge Devices: A Time- and Space-aware Co-scheduling Approach

Danny Pereira,Anirban Ghose,Sumana Ghosh,Soumyajit Dey

doi:10.1145/3576197

Abstract

Neural Network (NN)-based real-time inferencing tasks are often co-scheduled on GPGPU-style edge platforms. Existing works advocate using different NN parameters for the same detection task in different environments. However, realizing such approaches remains challenging, given accelerator devices’ limited on-chip memory capacity. As a solution, we propose a multi-pass, time- and space-aware scheduling infrastructure for embedded platforms with GPU accelerators. The framework manages the residency of NN parameters in the limited on-chip memory while simultaneously dispatching relevant compute operations. The mapping decisions for memory operations and compute operations to the underlying resources of the platform are first determined in an offline manner. For this, we proposed a constraint solver-assisted scheduler that optimizes for schedule makespan. This is followed by memory optimization passes, which take the memory budget into account and accordingly adjust the start times of memory and compute operations. Our approach reports a 74%–90% savings in peak memory utilization with 0%–33% deadline misses for schedules that suffer miss percentage in ranges of 25%–100% when run using existing methods.

Full Text