Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning

Borja Fernandez-Gauna,Ismael Etxeberria-Agiriano,Manuel Graña

doi:10.1371/journal.pone.0127129

Borja Fernandez-Gauna, Ismael Etxeberria-Agiriano + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0127129

Copy DOI

Journal: PLOS ONE	Publication Date: Jul 9, 2015
Citations: 17	License type: CC BY 4.0

Affiliation: University of the Basque Country

Abstract

Multi-Agent Reinforcement Learning (MARL) algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round Robin Q-learning (D-RR-QL) algorithm for cooperative systems. The computational complexity of this algorithm increases linearly with the number of agents. Moreover, it eliminates environment non sta tionarity by carrying a round-robin scheduling of the action selection and execution. That this learning scheme allows the implementation of Modular State-Action Vetoes (MSAV) in cooperative multi-agent systems, which speeds up learning convergence in over-constrained systems by vetoing state-action pairs which lead to undesired termination states (UTS) in the relevant state-action subspace. Each agent’s local state-action value function learning is an independent process, including the MSAV policies. Coordination of locally optimal policies to obtain the global optimal joint policy is achieved by a greedy selection procedure using message passing. We show that D-RR-QL improves over state-of-the-art approaches, such as Distributed Q-Learning, Team Q-Learning and Coordinated Reinforcement Learning in a paradigmatic Linked Multi-Component Robotic System (L-MCRS) control problem: the hose transportation task. L-MCRS are over-constrained systems with many UTS induced by the interaction of the passive linking element and the active mobile robots.

Highlights

The transportation problem The transportation of a hose by a team of robots is a paradigmatic instance of the tasks that can be performed with Multi-Component Robotic Systems (MCRS) [1]
We present two different distributed Round-Robin Q-Learning algorithms (D-Round Robin (RR)-QL) update rules for stochastic games of type C-RR-Stochastic Game (SG) that differ in their communication strategies: the first one requires at each time-step to send information from the current agent to the in turn according to the Round-Robin schedule, the second is communication-free
Computational experiments show that Distributed Round Robin Q-learning (D-RR-QL) using Modular State-Action Vetoes (MSAV) policies can provide a valid joint-action policy approximating the optimal policy faster than Distributed Q-Learning (D-QL), Team Q-Learning [24] (Team-QL), and Coordinated-Reinforcement Learning (RL) in over-constrained systems

Summary

Introduction

The transportation problem The transportation of a hose by a team of robots is a paradigmatic instance of the tasks that can be performed with Multi-Component Robotic Systems (MCRS) [1]. The simplest model-free RL algorithm is Q-learning [4, 5], which applies an iterative reward propagation rule to estimate the state-action value function implementing the optimal policy. Some authors have proposed model based heuristic algorithms [30, 36] to estimate the most likely response of the remaining agents, using them to bias local policies towards coordinated joint actions Each state in an MDP can be regarded as a virtual stateless Stochastic Game (SG), adaptive methods [14, 37] have been proposed to bias local action selection towards a globally optimal joint action These approaches require additional memory resources and knowledge about the optimal state-joint-action function, scaling badly with the problem size. The use of the Coordination Graph reduces the state-action space by defining which actions are relevant to each local value function, and it can still be further reduced by identifying which state variables are relevant to each local value function

Related Work

Experiments

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

A Multiagent Fuzzy Policy Reinforcement Learning Algorithm with Application to Leader-Follower Robotic Systems
Erfu Yang ... Dongbing Gu
-
Erfu Yang, et. al.Erfu Yang ... Dongbing Gu
01 Oct 2006
01 Oct 2006

Rules-PPO-QMIX: Multi-Agent Reinforcement Learning with Mixed Rules for Large Scene Tasks
Zi-Zhen Shen ... Rui Yu
-
Zi-Zhen Shen, et. al.Zi-Zhen Shen ... Rui Yu
22 Oct 2021
22 Oct 2021

LMRL: A Multi-Agent Reinforcement Learning Model and Algorithm
Ben-Nian Wang ... Shi-Fu Chen
-
Ben-Nian Wang, et. al. Ben-Nian Wang ... Shi-Fu Chen
04 Jul 2005
04 Jul 2005

A two-layered multi-agent reinforcement learning model and algorithm
Ben-Nian Wang ... Shi-Fu Chen
Journal of Network and Computer Applications | VOL. 30
Ben-Nian Wang, et. al.Ben-Nian Wang ... Shi-Fu Chen
16 Nov 2006
Journal of Network and Computer Applications | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE