Abstract

As various types of network threats have increased recently, manual threat response by security analysts has become a limitation. To compensate for this, the importance of security information event management (SIEM), a response system that collects and analyzes threat events from security devices, has been emphasized. In general, SIEM has adopted a signature-based threat classification model that generates large volumes of false threat events and burdens the work of security analysts. To address this limitation of SIEM, research has attempted to develop an AI-based threat classification model. In particular, deep learning-based threat classification models are known to have high accuracy in classifying threats. In this study, we focused on the excessive overhead incurred in learning and classifying large sets of threat events using deep learning models, which becomes an overhead in actual SIEM operations. In this study, we selected representative deep-learning-based models for threat classification, such as CNN, LSTM, and GRU, based on the models used in previous studies. We then extend these models considering the characteristics of actual threat events collected from a security operation center (SOC) environment. Specifically, CNN-static(2D) and LSTM were extended to use our own encoding method. CNN-static(1D) was developed to reduce the time for model learning and threat classification while minimizing the loss of accuracy compared to CNN-static(2D). GRU was first used for threat classification in this study. In this study, we aimed to identify the most effective deep learning model for SIEM in terms of both threat classification accuracy and learning and classification efficiency. To consider real workloads, we use 2.6 million threat events collected from a representative security vendor in South Korea, and manually annotated by security experts. First, in terms of threat classification accuracy, the LSTM model has the highest F1-score, but all the models show high accuracy between 94.07% and 95.11% in terms of recall, which is the most important metric at SOC. Second, in terms of model learning and classification times, CNN-static(1D) significantly outperformed the other models. To validate the models in the actual SOC environment, we also present two cost-based scenarios to estimate the cost of the models for specific scenarios: the cost of rebuilding the model to respond to zero-day attacks, and the cost of classifying threats in the actual SOC environment. We show that CNN-static(1D) is the most cost-effective model for both scenarios.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.