This work studies the joint scheduling- admission control (SAC) problem for a single user over a fading channel. Specifically, the SAC problem is formulated as a constrained Markov decision process (MDP) to maximize a utility defined as a function of the throughput and queue size. The optimal throughput- queue size trade-off is investigated. Optimal policies and their structural properties (i.e., monotonicity and convexity) are derived for two models: simultaneous and sequential scheduling and admission control actions. Furthermore, we propose online learning algorithms for the optimal policies for the two models when the statistical knowledge of the time-varying traffic arrival and channel processes is unknown. The analysis and algorithm development are relied on the reformulation of the Bellman's optimality equations using suitably defined state-value functions which can be learned online, at transmission time, using time-averaging. The learning algorithms require less complexity and converge faster than the conventional Q-learning algorithms. This work also builds a connection between the MDP based formulation and the Lyapunov optimization based formulation for the SAC problem. Illustrative results demonstrate the performance of the proposed algorithms in various settings.
Read full abstract