Deep Neural Networks (DNNs) are critical for modern intelligent processing but cause significant latency and energy consumption issues on mobile devices due to their high computational demands. Moreover, different tasks have different accuracy demands for DNN inference. To balance latency and accuracy across various tasks, we introduce PreDNNOff, a method that offloads DNNs at a layer granularity within the Mobile Edge Computing (MEC) environment. PreDNNOff utilizes a binary stochastic programming model and Genetic Algorithms (GAs) to optimize the expected latency for multiple exit points based on the distribution of task inference accuracy and layer latency regression models. Compared to the existing method Edgent, PreDNNOff has achieved a reduction of about 10% in the expected total latency, and due to the consideration of different tasks’ varying requirements for accuracy, it has a broader applicability.
Read full abstract