Adaptive Alert Management for Balancing Optimal Performance among Distributed CSOCs using Reinforcement Learning

Ankit Shah,Rajesh Ganesan,Hasan Cam,Sushil Jajodia,Pierangela Samarati

doi:10.1109/tpds.2019.2927977

Abstract

Large organizations typically have Cybersecurity Operations Centers (CSOCs) distributed at multiple locations that are independently managed, and they have their own cybersecurity analyst workforce. Under normal operating conditions, the CSOC locations are ideally staffed such that the alerts generated from the sensors in a work-shift are thoroughly investigated by the scheduled analysts in a timely manner. Unfortunately, when adverse events such as increase in alert arrival rates or alert investigation rates occur, alerts have to wait for a longer duration for analyst investigation, which poses a direct risk to organizations. Hence, our research objective is to mitigate the impact of the adverse events by dynamically and autonomously re-allocating alerts to other location(s) such that the performances of all the CSOC locations remain balanced. This is achieved through the development of a novel centralized adaptive decision support system whose task is to re-allocate alerts from the affected locations to other locations. This re-allocation decision is non-trivial because the following must be determined: (1) timing of a re-allocation decision, (2) number of alerts to be re-allocated, and (3) selection of the locations to which the alerts must be distributed. The centralized decision-maker (henceforth referred to as agent) continuously monitors and controls the level of operational effectiveness-LOE (a quantified performance metric) of all the locations. The agent's decision-making framework is based on the principles of stochastic dynamic programming and is solved using reinforcement learning (RL). In the experiments, the RL approach is compared with both rule-based and load balancing strategies. By simulating real-world scenarios, learning the best decisions for the agent, and applying the decisions on sample realizations of the CSOC's daily operation, the results show that the RL agent outperforms both approaches by generating (near-) optimal decisions that maintain a balanced LOE among the CSOC locations. Furthermore, the scalability experiments highlight the practicality of adapting the method to a large number of CSOC locations.

Full Text