Abstract

Multi-Agent Reinforcement Learning (MARL) algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round Robin Q-learning (D-RR-QL) algorithm for cooperative systems. The computational complexity of this algorithm increases linearly with the number of agents. Moreover, it eliminates environment non sta tionarity by carrying a round-robin scheduling of the action selection and execution. That this learning scheme allows the implementation of Modular State-Action Vetoes (MSAV) in cooperative multi-agent systems, which speeds up learning convergence in over-constrained systems by vetoing state-action pairs which lead to undesired termination states (UTS) in the relevant state-action subspace. Each agent’s local state-action value function learning is an independent process, including the MSAV policies. Coordination of locally optimal policies to obtain the global optimal joint policy is achieved by a greedy selection procedure using message passing. We show that D-RR-QL improves over state-of-the-art approaches, such as Distributed Q-Learning, Team Q-Learning and Coordinated Reinforcement Learning in a paradigmatic Linked Multi-Component Robotic System (L-MCRS) control problem: the hose transportation task. L-MCRS are over-constrained systems with many UTS induced by the interaction of the passive linking element and the active mobile robots.

Highlights

  • The transportation problem The transportation of a hose by a team of robots is a paradigmatic instance of the tasks that can be performed with Multi-Component Robotic Systems (MCRS) [1]

  • We present two different distributed Round-Robin Q-Learning algorithms (D-Round Robin (RR)-QL) update rules for stochastic games of type C-RR-Stochastic Game (SG) that differ in their communication strategies: the first one requires at each time-step to send information from the current agent to the in turn according to the Round-Robin schedule, the second is communication-free

  • Computational experiments show that Distributed Round Robin Q-learning (D-RR-QL) using Modular State-Action Vetoes (MSAV) policies can provide a valid joint-action policy approximating the optimal policy faster than Distributed Q-Learning (D-QL), Team Q-Learning [24] (Team-QL), and Coordinated-Reinforcement Learning (RL) in over-constrained systems

Read more

Summary

Introduction

The transportation problem The transportation of a hose by a team of robots is a paradigmatic instance of the tasks that can be performed with Multi-Component Robotic Systems (MCRS) [1]. The simplest model-free RL algorithm is Q-learning [4, 5], which applies an iterative reward propagation rule to estimate the state-action value function implementing the optimal policy. Some authors have proposed model based heuristic algorithms [30, 36] to estimate the most likely response of the remaining agents, using them to bias local policies towards coordinated joint actions Each state in an MDP can be regarded as a virtual stateless Stochastic Game (SG), adaptive methods [14, 37] have been proposed to bias local action selection towards a globally optimal joint action These approaches require additional memory resources and knowledge about the optimal state-joint-action function, scaling badly with the problem size. The use of the Coordination Graph reduces the state-action space by defining which actions are relevant to each local value function, and it can still be further reduced by identifying which state variables are relevant to each local value function

Related Work
Experiments
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.