Abstract

As the development of deep neural network (DNN) tasks that feature a directed acyclic graph (DAG) surges in modern industries, simultaneously meeting the latency and accuracy requirements of these key DNN tasks has remained elusive. The general feasible solution is to find an optimal DNN partition that combines cloud with edge computing to implement edge–cloud collaboration for DNN inference. Nevertheless, the dynamicity of network conditions and the uncertain availability of cloud computing resources pose formidable obstacles to this collaborative inference. In this study, we formulate the optimization of DNN partition as a minimum cut problem in DAG derived from DNN and then propose an early-exit DAG-DNN inference (EDDI) framework that supports synergistically on-demand inference acceleration for DAG DNN. This framework introduces two novel components: (1) Evaluator that derives an approximately optimal solution for constructing DNN into appropriate DAG, assisting in solving the optimization problem; and (2) Optimizer that enables collaborative optimization of the early exit and DNN partition strategy at run time to improve performance while meeting user-defined latency requirements. Quantitative evaluations show that EDDI outperforms state-of-the-art schemes by 10.6%, 3.3%, and 3%, on average, in terms of model accuracy, inference latency, and throughput under diverse latency constraints, respectively. Meanwhile, latency speedup ratio increases by an average of 8% and 4% under varying network conditions and cloud server capacities, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.