Abstract

In this paper, the drone base-station (DBS) dispatching problem in a multi-cell B5G/6G network is investigated. The main objective is to achieve the highest system profit by serving the largest possible number of users with the least possible cost while considering the uncertain time-dependent fluctuated user’s (service) demand in the different cells, the cost of dispatched drones, and the possible profit loss due to un-served users. The problem is formulated as a profit-maximization discount return problem. Due to the uncertainty in the demand (users) in each cell, the problem cannot be solved using conventional optimization methods. Hence, the problem is reformulated as a Markov decision problem (MDP). Due to the exponential complexity of finding the solution and the unavailability of statistical knowledge about user availability (demand) in the considered regions for such optimization, we adopt a reinforcement learning (RL) approach based on the state-action-reward-state-action (SARSA) algorithm to efficiently solve the MDP. Simulation results reveal that our RL-based approach significantly increases the overall operator profit by continuously adapting its DBS dispatching strategy based on the learned users’ behavior in the network, which enables serving a larger number of users (highest revenue) with least number of DBSs (least cost).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call