Intrusion detection using Highest Wins feature selection algorithm

Rami Mustafa A Mohammad,Mutasem K Alsmadi

doi:10.1007/s00521-021-05745-w

Abstract

The rapid advancement of Internet stimulates building intelligent data mining systems for detecting intrusion attacks. The performance of such systems might be negatively affected due to the big datasets employed in the learning phase. Determining the appropriate group of features within training datasets is an essential phase when building data mining classification models. Nevertheless, the resulted minimized set of features should maintain or even improve the performance of the classification models. Throughout this article, an innovative feature selection algorithm is proposed and is called “the Highest Wins” (HW). To evaluate the generalization ability of HW, it has been applied for creating classification models using naive Bayes technique from 10 benchmark datasets. The obtained results were compared against two well-known strategies, namely chi-square and information gain. The experimental results confirmed the competitiveness ability of the suggested strategy in terms of various evaluation measurements such as recall, precision, and error rate while significantly decreasing the number of selected features in datasets. Further, the HW is used for building a naive Bayes and decision tree intrusion detection classifiers using the well-known dataset from Network Security Laboratory-Knowledge Discovery in Databases (NSL-KDD). The results were promising not just in terms of overall performance, but also in terms of the time needed to build the classification model.

Full Text