End-to-End Reinforcement Learning for Multi-agent Continuous Control

Zilong Jiao,Jae Oh

doi:10.1109/icmla.2019.00100

Abstract

In end-to-end reinforcement learning, an agent captures the entire mapping from its raw sensor data to actuation commands using a single neural network. End-to-end reinforcement learning is mostly studied in single-agent domains, and its scalability to multi-agent setting is under-explored. Without effective techniques, learning effective policies based on the joint observation of agents can be intractable, particularly when sensor data perceived by each agent is high-dimensional. Extending the multi-agent actor-critic method MADDPG, this paper presents Rec-MADDPG, an end-to-end reinforcement learning method for multi-agent continuous control in a cooperative environment. To ease end-to-end learning in a multi-agent setting, we proposed two embedding mechanisms, joint and independent embedding, to project agents' joint sensor observation to low-dimensional features. For training efficiency, we applied parameter sharing and the A3C-based asynchronous framework to Rec-MADDPG. Considering the challenges that can arise in real-world multi-agent control, we evaluated Rec-MADDPG in robotic navigation tasks based on realistic simulated robots and physics enable environments. Through extensive evaluation, we demonstrated that Rec-MADDPG can significantly outperform MADDPG and was able to learn individual end-to-end policies for continuous control based on raw sensor data. In addition, compared to joint embedding, independent embedding enabled Rec-MADDPG to learn even better optimal policies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

End-to-End Reinforcement Learning for Multi-agent Continuous Control

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

End-to-end Personalization of Digital Health Interventions using Raw Sensor Data with Deep Reinforcement Learning
Ali El Hassouni ... Mark Hoogendoorn
-
Ali El Hassouni, et. al.Ali El Hassouni ... Mark Hoogendoorn
14 Oct 2019
14 Oct 2019

How do we approach intrinsic motivation computationally?
Cornelius Weber
Frontiers in Neurorobotics | VOL. 2
Cornelius WeberCornelius Weber
01 Jan 2008
Frontiers in Neurorobotics | VOL. 2

Fault Diagnosis from Raw Sensor Data Using Deep Neural Networks Considering Temporal Coherence.
Ran Zhang ... Beibei Yao
Sensors | VOL. 17
Ran Zhang, et. al.Ran Zhang ... Beibei Yao
09 Mar 2017
Sensors | VOL. 17

Contraction Mappings in the Theory Underlying Dynamic Programming
Eric V Denardo
SIAM Review | VOL. 9
Eric V DenardoEric V Denardo
01 Apr 1967
SIAM Review | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

End-to-End Reinforcement Learning for Multi-agent Continuous Control

Abstract

Talk to us

Similar Papers