Abstract

In training machine learning and artificial intelligence models for Intrusion Detection Systems (IDS), feature selection plays a critical role in evaluating the prediction performance and explainability of the trained model. The feature selection in designing the IDS is often hindered by the volume, variety, and veracity of the data generated from Internet-of-Things (IoT) and Cyber-Physical Systems (CPS) devices. In this paper, we explored selecting the best subset of features to reduce the feature space of high-dimensional datasets and thereby improve performance, explainability, and computing time. We incorporated the feature selection method of permutation importance in XGB models and prediction explainability methods, such as SHAP and LIME. Using two publicly available IDS datasets, NSL-KDD and CCID-V1, our feature selection-based XBG model for the NSL-KDD data reduced features from 42 to 20 with an AUC score of 0.8751 from the previous 0.8530 with 60% improvement in training time. A similar model for the CCID-V1 data reduced the features from 82 to 22 and achieved an AUC of 0.9999 with a 46% improvement in computing time. We also observed that SHAP and LIME explanations of the prediction showed consistent results in selecting important features. Our study demonstrated that the feature selection achieved an improvement in performance and explainability along with lower training time, which increases the usability of our model for the design of IDS.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call