Abstract

User equipment produces a series of tasks that are processed locally or remotely, falling into three categories: (i) local computing only, (ii) a fraction of the task is computed locally and the remaining task unprocessed is offloaded for remote computation, and (iii) the entire task is offloaded. Each case has attracted substantial attention in recent studies, where a delay-constrained non-linear optimization problem is often formulated. The solutions employed are either based on Lagrange duality, heuristic search, or dynamic programming. To our knowledge, there is no unifying task-processing orchestrator that is an online tailored solver for learning the model-free problems, encapsulating the three cases above. We fill this gap and present the first attempt on an innovative actor-critic reinforcement learning approach in consideration of the energy-efficiency, to compute the asymptotically optimal solutions via decomposing the comprehensive optimization into sub-problems. Rigorous theoretical analyses and experience-driven simulations demonstrate significant advantages over the benchmark approaches, in terms of task-processing delay, power efficiency, and convergence time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call