Off-policy actor-critic deep reinforcement learning methods for alert prioritization in intrusion detection systems

Lalitha Chavali,Paresh Saxena,Abhinav Krishnan,Barsha Mitra,Aneesh Chivukula

doi:10.1016/j.cose.2024.103854

Abstract

Alert prioritization plays a very important role in network security as it helps security teams manage and respond to the overwhelming volume of alerts generated by intrusion detection systems (IDSs). Although current deep reinforcement learning (DRL) based deep deterministic policy gradient (DDPG) has achieved good performance for alert prioritization, there are still limitations in dealing with DDPG's instability and limited exploration capabilities. In this paper, we propose TD3-AP and SAC-AP, two DRL empowered off-policy actor-critic alert prioritization methods based on twin delayed deep deterministic policy gradient (TD3) and soft actor-critic (SAC), respectively. We model the interaction between an adversary and a defender as a zero-sum game and use a double oracle framework to obtain mixed strategy Nash equilibrium (MSNE). Our objective is to minimize the defender's loss which represents the inability of the defender to investigate alerts generated due to the attacks. To demonstrate the benefits and scalability of the proposed approaches, we conduct extensive experiments on three datasets: MQTT-IoT-IDS2020, DARPA 2000 LLDOS 1.0 and CSE-CIC-IDS2018. The results depict that TD3-AP and SAC-AP exhibit a decrease in the defender's loss by 50% and 14.28%, respectively, in comparison to the alert prioritization method based on DDPG. The improvements are significantly higher when the proposed approaches are compared with traditional alert prioritization methods including Uniform, Snort, Fuzzy and RAP. Additionally, we also provide an analysis of the interpretability of our results through the utilization of SHapley Additive exPlanations (SHAP). We assess the sensitivity of our proposed approaches to different hyperparameters and evaluate the training time and computational resources necessary for their implementation.

Full Text