Abstract
We design a joint radio and computational resource allocation policy for a multi-user mobile edge computing system, such that the expected power consumption is minimized while satisfying long-term delay constraints. The problem is formulated as a constrained Markov decision process (CMDP) that is efficiently solved by the proposed constrained reinforcement learning (CRL) algorithm, called successive convex programming based policy optimization (SCPPO). SCPPO solves a convex objective/feasibility surrogate problem at each update and it can provably converge to a Karush-Kuhn-Tucker (KKT) point of the original CMDP problem almost surely under some mild conditions. Moreover, SCPPO adopts an application-specific policy architecture and employs a data-efficient estimation strategy that can reuse old experiences, such that SCPPO can realize fast learning with low computational complexity.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.