Network Intrusion Detection Systems (NIDS) have been extensively investigated by monitoring real network traffic and analyzing suspicious activities. However, there are limitations in detecting specific types of attacks with NIDS, such as Advanced Persistent Threats (APT). Additionally, NIDS is restricted in observing complete traffic information due to encrypted traffic or a lack of authority. To address these limitations, a Host-based Intrusion Detection system (HIDS) evaluates resources in the host, including logs, files, and folders, to identify APT attacks that routinely inject malicious files into victimized nodes. In this study, a hybrid network intrusion detection system that combines NIDS and HIDS is proposed to improve intrusion detection performance. The host data undergoes a Language Processing (NLP)-based Bidirectional Encoder Representations from Transformers (BERT) model from textual representation to a numerical one in order to process host data in a similar way to the network flow data through machine learning models. The feature flattening technique is applied to flatten two-dimensional host-based features that is provided by BERT into one-dimensional vectors so that host-based and network flow-based features can be processed by advanced Machine Learning (ML) models. In order to enhance HIDS effectiveness, a two-stage collaborative classifier is utilized, which applies two tiers of machine learning algorithms, binary and multi-class classifiers, to detect network intrusions. Once a binary classifier is used to detect benign samples to reduce the complexity of the original problem, the attack data are classified by a multi-class supervised learner to identify attack types. Hence, the overall performance of the two-stage collaborative model outperforms the baseline classifier, XGBoost. The proposed method is shown to generalize across two well-known datasets, CICIDS 2018 and NDSec-1. The performance of XGBoost, which represents conventional ML, is evaluated. Combining host and network features enhances attack detection performance (macro average F1 score) by 8.1% under the CICIDS 2018 dataset and 3.7% under the NDSec-1 dataset. Meanwhile, the two-stage collaborative classifier improves detection performance for most single classes, especially for DoS-LOIC-UDP and DoS-SlowHTTPTest, with improvements of 30.7% and 84.3%, respectively, when compared with the traditional ML models.
Read full abstract