Abstract

With the rapid development of networks, intrusion detection has received increasing attention. In order to solve the problems of large dimensionality of intrusion detection data, unbalanced data samples, and large dispersion of datasets, which seriously affect the classification performance, this study proposes an anomaly detection based on Boruta and extreme tree (Boruta-ET) model. First, the network traffic data are preprocessed, which includes data cleaning, numerical and normalization processes, as well as equalization of the attack categories for a small number of samples by random oversampling at the data level; second, the traffic features are dimensionality reduced using the Boruta-based algorithm. The goal of Boruta dimensionality reduction is to extract all the features related to the dependent variable with a global dimension and find the optimal subset of features containing the most information; finally, the optimal feature subset is used as the input parameters of the extreme tree (ET) algorithm model for training and testing. Experiments were conducted on the real network traffic dataset CICIDS2017, and by evaluating the classification performance of several different machine learning algorithms, the experimental results show that the Boruta-ET model has the best performance with an accuracy rate of 99.80%, which can effectively improve the detection rate and achieve an effective recall rate for attack types with a small number of samples.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call