Reinforcement learning with predefined and inferred reward machines in stochastic games

Jueming Hu,Yash Paliwal,Hyohun Kim,Yanze Wang,Zhe Xu

doi:10.1016/j.neucom.2024.128170

Abstract

This paper focuses on Multi-Agent Reinforcement Learning (MARL) in non-cooperative stochastic games, particularly addressing the challenge of task completion characterized by non-Markovian reward functions. We employ Reward Machines (RMs) to incorporate high-level task knowledge. Firstly, we introduce Q-learning with Reward Machines for Stochastic Games (QRM-SG), where RMs are predefined and available to agents. QRM-SG learns each agent’s best-response policy at Nash equilibrium by defining the Q-function in augmented state space that integrates the stochastic game and RM states. The Lemke-Howson method is utilized to compute the best-response policies for the stage game defined by the current Q-functions at each time step. Subsequently, we explore a more challenging scenario where RMs are unavailable and propose Multi-Agent Reinforcement learning with Concurrent High-level knowledge inference (MARCH). MARCH uses automata learning to learn RMs iteratively and combines this process with QRM-SG for learning the best-response policies. The RL episodes where the obtained rewards are inconsistent with the rewards from the current RMs trigger the inference of new RMs. We prove QRM-SG and MARCH converge to the best-response policies under certain conditions. Two scenarios are conducted to demonstrate the superior performance of QRM-SG and MARCH compared to baseline methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Reinforcement learning with predefined and inferred reward machines in stochastic games

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Similar Papers

Reinforcement Learning with Reward Machines in Stochastic Games
Jueming Hu ... Yanze Wang
-
Jueming Hu, et. al.Jueming Hu ... Yanze Wang
28 Sep 2023
28 Sep 2023

On Multi-Agent Reinforcement Learning in Matrix, Stochastic and Differential Games
Mostafa Awheda
-
Mostafa AwhedaMostafa Awheda
04 Oct 2018
04 Oct 2018

Best-Response Multiagent Learning in Non-Stationary Environments
...
-
, et. al. ...
19 Jul 2004
19 Jul 2004

Theoretical considerations of potential-based reward shaping for multi-agent systems
...
-
, et. al. ...
02 May 2011
02 May 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reinforcement learning with predefined and inferred reward machines in stochastic games

Abstract

Talk to us

Similar Papers

More From: Neurocomputing