Abstract

In this article, we consider centralized training and decentralized execution (CTDE) with diverse and private reward functions in cooperative multiagent reinforcement learning (MARL). The main challenge is that an unknown number of agents, whose identities are also unknown, can deliberately generate malicious messages and transmit them to the central controller. We term these malicious actions as Byzantine attacks. First, without Byzantine attacks, we propose a reward-free deep deterministic policy gradient (RF-DDPG) algorithm, in which gradients of agents' critics rather than rewards are sent to the central controller for preserving privacy. Second, to cope with Byzantine attacks, we develop a robust extension of RF-DDPG termed R2F-DDPG, which replaces the vulnerable average aggregation rule with robust ones. We propose a novel class of RL-specific Byzantine attacks that fail conventional robust aggregation rules, motivating the projection-boosted robust aggregation rules for R2F-DDPG. Numerical experiments show that RF-DDPG successfully trains agents to work cooperatively and that R2F-DDPG demonstrates robustness to Byzantine attacks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.