Abstract

Each robot utilizes the reinforcement learning (RL) to control its maneuver and these robots can collaborate to accomplish a common goal to form a collaborative multi-agent system (MAS). Due to the constraints of distributive locations and different poses of robots, in practice, each agent (robot) in such a collaborative MAS can only partially observe the environment and other agents (such as competitive agents), and consequently operate based on its belief of the state(s). The alignment of the beliefs of collaborative agents can be therefore enhanced by adopting wireless communications, but is rarely studied in literature. To explore wireless communications applied to collaborative partially-observable reinforcement learning (PORL), we propose that each collaborative agent predicts the environment dynamics, including the behavior of those agents outside the collaborative MAS, and then constructs the learning-based belief of the world (i.e. global state). To assist such prediction and learning, we modify the RL assisted by the wireless communication functionality into two stages: prediction of the state and local actor-and-critic on global value(s). In other words, while one agent predicts and learns its own policy, another agent can updates critics on the sequence of history to update global value(s) that can further assist to validate the prediction. From numerical experiments, we find that the timing of communication or information exchange among collaborative agents has critical impact on the duration of learning and prediction, and thus the performance of MAS, which suggests the desirable communication for distributed PORL among collaborative agents toward an efficient MAS.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call