To provide ubiquitous and low-latency communication and computation services for remote and disaster areas, high altitude platform (HAP) and low earth orbit (LEO) satellite integrated multi-access edge computing (HLS-MEC) networks have emerged as a promising solution. However, most current studies directly assume that the number of connected satellites is fixed and neglect the modeling of the time-varying multi-satellite computing process. Motivated by this, we establish an M/G/K(t) queuing model to illustrate task computation on satellites. To evaluate the efficiency and quality of computation offloading and splitting, we develop a utility model. This model is defined as a difference between a value function that assesses the trade-offs of task offloading, considering latency reductions and energy savings, and a cost function that quantifies expenses related to latency and energy consumption. After formulating the utility maximization problem, we propose the deep reinforcement learning-based offloading and splitting (DBOS) scheme that can overcome the time-varying uncertainties and high dynamics in the HLS-MEC network. Specifically, the DBOS scheme can learn the best computation offloading and splitting policy to maximize the utility by sensing the number of connected satellites, the distance between the HAP and satellites, the available computing resources, and the task arrival rate. Finally, we evaluate and validate the computational complexity and convergence property of the DBOS scheme. Numerical results show that the DBOS scheme outperforms the other three benchmarks and maximizes the utility under time-varying dynamics.
Read full abstract