Abstract
This work deals with the task offloading problem for multiple cellular edge devices in a multi-access edge computing (MEC) infrastructure attached to a base-station (BS). In order to minimize the overall task computing-communication delay through coping with time-varying cost and constraint functions with unknown statistics on-the-go, we propose a novel distributed bandit optimization (DBO) algorithm which runs based on the projected dual gradient iterations and a single broadcast communicating the MEC states to the SDs at the end of each time-slot. To track the performance of the proposed online learning algorithm over time, we define a dynamic regret to assess the closeness of the underlying delay cost of the DBO to a clairvoyant dynamic optimum and an aggregate violation metric to evaluate the asymptotic satisfaction of the constraints. We derive lower and upper bounds for dynamic regret as well as an upper-bound for the aggregate violation and show that the upper-bounds are sub-linear under sub-linear accumulated hindsight variations. The simulation results and comparisons confirm the effectiveness of the proposed algorithm in the long run.
Paper version not known (
Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have