Abstract
This paper focuses on Multi-Agent Reinforcement Learning (MARL) in non-cooperative stochastic games, particularly addressing the challenge of task completion characterized by non-Markovian reward functions. We employ Reward Machines (RMs) to incorporate high-level task knowledge. Firstly, we introduce Q-learning with Reward Machines for Stochastic Games (QRM-SG), where RMs are predefined and available to agents. QRM-SG learns each agent’s best-response policy at Nash equilibrium by defining the Q-function in augmented state space that integrates the stochastic game and RM states. The Lemke-Howson method is utilized to compute the best-response policies for the stage game defined by the current Q-functions at each time step. Subsequently, we explore a more challenging scenario where RMs are unavailable and propose Multi-Agent Reinforcement learning with Concurrent High-level knowledge inference (MARCH). MARCH uses automata learning to learn RMs iteratively and combines this process with QRM-SG for learning the best-response policies. The RL episodes where the obtained rewards are inconsistent with the rewards from the current RMs trigger the inference of new RMs. We prove QRM-SG and MARCH converge to the best-response policies under certain conditions. Two scenarios are conducted to demonstrate the superior performance of QRM-SG and MARCH compared to baseline methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.