The rapid growth in the number of devices and their connectivity has enlarged the attack surface and made cyber systems more vulnerable. As attackers become increasingly sophisticated and resourceful, mere reliance on traditional cyber protection, such as intrusion detection, firewalls, and encryption, is insufficient to secure the cyber systems. Cyber resilience provides a new security paradigm that complements inadequate protection with resilience mechanisms. A Cyber-Resilient Mechanism (CRM) adapts to the known or zero-day threats and uncertainties in real-time and strategically responds to them to maintain the critical functions of the cyber systems in the event of successful attacks. Feedback architectures play a pivotal role in enabling the online sensing, reasoning, and actuation process of the CRM. Reinforcement Learning (RL) is an important gathering of algorithms that epitomize the feedback architectures for cyber resilience. It allows the CRM to provide dynamic and sequential responses to attacks with limited or without prior knowledge of the environment and the attacker. In this work, we review the literature on RL for cyber resilience and discuss the cyber-resilient defenses against three major types of vulnerabilities, i.e., posture-related, information-related, and human-related vulnerabilities. We introduce moving target defense, defensive cyber deception, and assistive human security technologies as three application domains of CRMs to elaborate on their designs. The RL algorithms also have vulnerabilities themselves. We explain the major vulnerabilities of RL and present develop several attack models where the attacker target the information exchanged between the environment and the agent: the rewards, the state observations, and the action commands. We show that the attacker can trick the RL agent into learning a nefarious policy with minimum attacking effort. The paper introduces several defense methods to secure the RL-enabled systems from these attacks. However, there is still a lack of works that focuses on the defensive mechanisms for RL-enabled systems. Last but not least, we discuss the future challenges of RL for cyber security and resilience and emerging applications of RL-based CRMs.
Read full abstract