In satellite-terrestrial integrated networks, it is a common practice to schedule real-time tasks from low Earth orbit (LEO) satellites to ground stations (GSs) for data processing. However, the joint task scheduling and resource allocation under unknown environment dynamics (e.g., transmission latency) remains to be a challenging problem. First, the tradeoff between task latencies and energy consumption should be carefully considered when making decisions to minimize task latencies under time-averaged energy consumption constraints. Second, to learn the environment uncertainties and minimize the system performance loss (i.e., regret) in terms of task latencies, both online feedback and offline history should be leveraged efficiently, and the accompanying exploration-exploitation tradeoff should be dealt with in a proper way. In this article, we formulate the joint task scheduling and resource allocation problem as a constrained combinatorial multi-armed bandit (CMAB) problem. To solve the problem, by integrating online learning, online control, and offline historical information, we propose a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Task scheduling and Resource allocation scheme with Data-driven Bandit Learning</i> called <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">TRDBL</i> . Our theoretical and numerical results show that TRDBL achieves a sublinear time-averaged regret while satisfying the time-averaged energy consumption constraints.
Read full abstract