Abstract
This letter studies the revenue maximization problem for the mobile edge computing (MEC) system, where an access point (AP) is equipped with an MEC server, providing job offloading service for multiple resource-hungry users and charging users a service fee for it. Usually, the information about users' personal demand is unknown and users' job arrival rate is time-varying, which make pricing highly challenging. As such, we develop a policy gradient (PG)-based reinforcement learning (RL) algorithm. In specific, a deep neural network (DNN) is adopted as the policy network to design price policy, and a baseline neural network (BNN) is used to reduce the inherent high variance of the gradient obtained using PG. The proposed PG-based algorithm enables continuous pricing, thus constituting an advancement over the conventional Q-learning algorithm that has provided only discrete action space. Simulation results show that our proposed method converges to the optimal revenue performance, while the Q-learning algorithm suffers 44% revenue loss.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have