Strategy-Following Multi-Agent Deep Reinforcement Learning through External High-Level Instruction

Yoshinari Motokawa,Toshiharu Sugawara

doi:10.1016/j.procs.2023.10.272

Abstract

In this paper, we propose the strategy-following distributed attentional actor architecture after conditional attention (sfDA6-X) for multi-agent deep reinforcement learning (MADRL). The architecture is designed to provide controllability of coordinated behaviors in MADRL through the use of a saliency vector that captures conditions from the environment and rough, high-level instructions given by external experts such as the system designers or users. We introduce destination channels, which enable agents to be instructed to change their behaviors only by specifying the regions where individual agents should work. To validate the effectiveness of sfDA6-X, we conducted experiments in the object collection game and analyzed how agents change their coordinated and cooperative behaviors based on the given instructions. Our findings suggest that our approach provides a preliminary but promising solution for controlling coordinated behaviors in MADRL via arbitrary instructions represented in destination channels.

Full Text