In real-time application domains, like finance, healthcare and defence, delay in service or stealing information may lead to unrecoverable consequences. So, early detection of intrusion is important to prevent security breaches. In recent days, anomaly-based intrusion detection using Hybrid Deep Learning approaches are becoming more popular. The most used benchmark datasets in the literature are NSL-KDD and UNSW-NB15, and these datasets are imbalanced. The models built on imbalanced datasets may lead to biased results towards majority classes by neglecting the minority class, even though they are equally important. In many cases, high accuracy is achieved for majority classes in the imbalanced datasets. But, the class-level performances are poor with respect to the minority class. The class balancing will also play an important role in attenuating the bias in prediction for imbalanced datasets. In this paper, a Hybrid Deep Learning Based Intrusion Detection (HDLBID) framework is proposed with CNN-BiLSTM combination. The four techniques, namely, Random Oversampling (ROS), ADASYN, SMOTE, and SMOTE-Tomek, are used for class balancing in the proposed HDLBID framework. The proposed HDLBID with SMOTE-Tomek achieves an overall accuracy of 99.6% with NSL-KDD and 89.02% for UNSW-NB15. It results in an improvement of 13.67% for NSL-KDD and 10.62% for UNSW-NB15 over the existing recent related models. In the proposed HDLBID, in addition to overall accuracy, the class-level F1 score is also calculated. A comparative study is presented to show the effectiveness of balancing dataset compared to imbalanced dataset, and observed that the SMOTE-Tomek class balancing comparatively performed well. An improvement of 37.43% is observed in the U2R class of the NSL-KDD dataset and 61.65% improvement is seen in the Worms class of the UNSW-NB15 dataset, both with SMOTE-Tomek class balancing. Therefore, the proposed HDLBID with SMOTE-Tomek class balancing reports the best results in terms of overall accuracy compared to existing recent related approaches. Also, in terms of class-level analysis, HDLBID reports best results with SMOTE-Tomek over imbalanced version of datasets.
Read full abstract