Multi-Agent Reinforcement Learning (MARL) is a key framework for building intelligent systems where multiple agents operate within a shared environment, with applications spanning autonomous driving, robotics, and distributed control systems. However, real-world deployment of MARL brings significant trust and safety challenges, as these systems are susceptible to a range of attacks that can compromise their robustness and reliability. This paper provides a comprehensive review of trust and safety attacks in MARL, categorizing various types of attacks and their implications. We explore existing defense mechanisms designed to mitigate these threats, highlighting their strengths and limitations. Additionally, we identify open challenges that remain unaddressed and propose potential future research directions to enhance the robustness and security of MARL systems.
Read full abstract