Abstract
Nowadays, the widely used Internet-of-Things (IoT) mobile devices (MDs) generate huge volumes of data, which need analyzing and extracting accurate information in real time by compute-intensive deep learning (DL) inference tasks. Due to its multilayer structure, the deep neural network (DNN) is appropriate for the mobile-edge computing (MEC) environment, and the DL tasks can be offloaded to DNN partitions deployed in MEC servers (MECSs) for speed-up inference. In this article, we first assume the arrival process of DL tasks as Poisson distribution and develop a tandem queueing model to evaluate the end-to-end (E2E) inference delay of DL tasks in multiple DNN partitions. To minimize the E2E delay, we develop a joint optimization problem model of partition deployment and resource allocation in MECSs (JPDRA). Since the JPDRA is a mixed-integer nonlinear programming (MINLP) problem, we decompose the original problem into a computing resource allocation (CRA) problem with fixed partition deployment decision and a DNN partition deployment (DPD) problem that optimizes the optimal-delay function related to the CRA problem. Next, we design a CRA algorithm based on Markov approximation and a low-complexity DPD algorithm to obtain the near-optimal solution in the polynomial time. The simulation results demonstrate that the proposed algorithms are more efficient and can reduce the average E2E delay by 25.7% with better convergence performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.