Abstract

Recently, multi-agent deep reinforcement learning (MADRL) has been studied to learn actions to achieve complicated tasks and generate their coordination structure. The reward assignment in MADRL is a crucial factor to guide and produce both their behaviors for their own tasks and coordinated behaviors by agents’ individual learning. However, it has not been sufficiently clarified the reward assignment in MADRL’s effect on learned coordinated behavior. To address this issue, using the sequential tasks, coordinated delivery and execution problem with expiration time, we analyze the effect of various ratios of the reward given for the task that agent is responsible for to the reward given for the whole task. Then, we propose a two-stage reward assignment with decay to learn the actions for tasks that the agent is responsible for and coordinated actions for facilitating other agents’ tasks. We experimentally showed that the proposed method enabled agents to learn both actions in a balanced manner, so they could realize effective coordination, by reducing the number of tasks that were ignored by other agents. We also analyzed the mechanism behind the emergence of different coordinated behaviors.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.