While conventional Intrusion Detection Systems (IDS) are essential for defending against intruders in the Industrial Internet of Things (IIoT), handling data from heterogeneous and streaming data sources should receive more attention. This work introduces a novel Optimized IForest-based Intrusion Detection System (OIFIDS) which is designed to handle both heterogeneous and streaming data efficiently. The suggested approach employs a collection of optimized binary trees, each of which is trained on a distinctive subset of data, and in which the location of empty leaves determines the anomaly score assigned to a certain data point. Optimizing isolation Forest (iForest) utilizing a modified version of the Harris Hawks Optimization algorithm, which exploits both Exploration factor and Random walk strategies (ERHHO) decreases the dataset's dimension, decreases its learning time, and enhances the detection precision, accuracy, F1-score, FPR, and recall. To demonstrate how effective the proposed approach is, it is evaluated using three datasets: CICIDS-2018, NSL-KDD, and UNSW-NB15. The experimental results prove the ability of the suggested approach in handling both heterogeneous and streaming data efficiently and delivering results that were comparable to the cutting-edge baseline techniques. Moreover, it performs effectively when there are no anomalies in the training sample and when dealing with challenging scenarios with several irrelevant features and high dimensions. Based on the comparison with various state-of-the-art IDSs, the suggested approach is able to detect intrusion with greater accuracies of 95.6%, 94.8%, and 99% than the other approaches on the NSL-KDD, UNSW-NB15, and CICIDS-2018 datasets, respectively. Experiments on heterogenous data reveal that Area Under the ROC Curve (AUC) of OIFIDS beats the baseline approach for UNSW-NB15 dataset and is higher than the second-best method by 8% and 2.4% for NSL-KDD and CICIDS-2018 respectively. Evaluating the proposed system on streaming data illustrates that it can address the concept drift problem well with high AUC value of 0.948, 0.97, and 0.922 on the NSL-KDD, CICIDS-2018, and UNSW-NB15 datasets, respectively.
Read full abstract