IDPS Signature Classification Based on Active Learning With Partial Supervision From Network Security Experts

Hidetoshi Kawaguchi,Shogo Okada,Yuichi Nakatani

doi:10.1109/access.2022.3211651

Abstract

As intrusion detection and prevention systems (IDPSs) grow in importance, the cost of managing signatures, which are pattern files of malicious communications, continues to increase. To ensure the optimal operation of an IDPS, network security experts need to classify the signatures generated over time according to their importance (low, medium, and high). Although machine learning approaches can be used to automatically classify signatures instead of human experts, there are several challenges to applying them, including (a) high annotation costs, (b) security incidents caused by classification errors, and (c) classification accuracy decreases due to domain shifts. To overcome these challenges, we propose a system based on active learning that collaborates with experts to periodically classify received signatures. The signatures are sorted by uncertainty sampling; some are transferred to experts, and the rest are automatically classified. The experts classify the transferred signatures and add them to the training dataset, and the classification model is retrained. After training, the new signatures that have not yet been labeled are classified. The proposed system executes this workflow each time it receives signatures. For evaluation purposes, a real dataset was collected monthly with the help of the experts. Experiments are conducted on this real dataset to evaluate the proposed system in a simulation case. An analysis is then performed by comparing several variants of the proposed system. The results show that the system with Monte Carlo dropout (MC-Dropout) performs best. We also show that this variation has two effects: it transfers more samples with “medium” importance to the experts and mitigates imbalances in the training dataset.

Full Text