On-demand inference acceleration for directed acyclic graph neural networks over edge-cloud collaboration

Lei Yang,Xiaoyuan Shen,Changyi Zhong,Yuwei Liao

doi:10.1016/j.jpdc.2022.09.005

Abstract

As the development of deep neural network (DNN) tasks that feature a directed acyclic graph (DAG) surges in modern industries, simultaneously meeting the latency and accuracy requirements of these key DNN tasks has remained elusive. The general feasible solution is to find an optimal DNN partition that combines cloud with edge computing to implement edge–cloud collaboration for DNN inference. Nevertheless, the dynamicity of network conditions and the uncertain availability of cloud computing resources pose formidable obstacles to this collaborative inference. In this study, we formulate the optimization of DNN partition as a minimum cut problem in DAG derived from DNN and then propose an early-exit DAG-DNN inference (EDDI) framework that supports synergistically on-demand inference acceleration for DAG DNN. This framework introduces two novel components: (1) Evaluator that derives an approximately optimal solution for constructing DNN into appropriate DAG, assisting in solving the optimization problem; and (2) Optimizer that enables collaborative optimization of the early exit and DNN partition strategy at run time to improve performance while meeting user-defined latency requirements. Quantitative evaluations show that EDDI outperforms state-of-the-art schemes by 10.6%, 3.3%, and 3%, on average, in terms of model accuracy, inference latency, and throughput under diverse latency constraints, respectively. Meanwhile, latency speedup ratio increases by an average of 8% and 4% under varying network conditions and cloud server capacities, respectively.

Full Text