Abstract

In a cybersecurity operations center (CSOC), under normal operating conditions in a day, sufficient numbers of analysts are available to analyze the amount of alert workload generated by intrusion detection systems (IDSs). For the purpose of this paper, this means that the cybersecurity analysts can fully investigate each and every alert that is generated by the IDSs in a reasonable amount of time. However, there are a number of disruptive factors that can adversely impact the normal operating conditions such as (1) higher alert generation rates from a few IDSs, (2) new alert patterns that decreases the throughput of the alert analysis process, and (3) analyst absenteeism. The impact of all the above factors is that the alerts wait for a long duration before being analyzed, which impacts the readiness of the CSOC. It is imperative that the readiness of the CSOC be quantified, which in this paper is defined as the level of operational effectiveness (LOE) of a CSOC. LOE can be quantified and monitored by knowing the exact deviation of the CSOC conditions from normal and how long it takes for the condition to return to normal. In this paper, we quantify LOE by defining a new metric called total time for alert investigation (TTA), which is the sum of the waiting time in the queue and the analyst investigation time of an alert after its arrival in the CSOC database. A dynamic TTA monitoring framework is developed in which a nominal average TTA per hour (avgTTA/hr) is established as the baseline for normal operating condition using individual TTA of alerts that were investigated in that hour. At the baseline value of avgTTA/hr, LOE is considered to be ideal. Also, an upper-bound (threshold) value for avgTTA/hr is established, below which the LOE is considered to be optimal. Several case studies illustrate the impact of the above disruptive factors on the dynamic behavior of avgTTA/hr, which provide useful insights about the current LOE of the system. Also, the effect of actions taken to return the CSOC to its normal operating condition is studied by varying both the amount and the time of action, which in turn impacts the dynamic behavior of avgTTA/hr. Results indicate that by using the insights learnt from measuring, monitoring, and controlling the dynamic behavior of avgTTA/hr, a manager can quantify and color-code the LOE of the CSOC. Furthermore, the above insights allow for a deeper understanding of acceptable downtime for the IDS, acceptable levels for absenteeism, and the recovery time and effort needed to return the CSOC to its ideal LOE.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call