Efficient Detection of Botnet Traffic by Features Selection and Decision Trees

Javier Velasco-Mata,Victor Gonzalez-Castro,Eduardo Fidalgo Fernandez,Enrique Alegre

doi:10.1109/access.2021.3108222

Javier Velasco-Mata, Victor Gonzalez-Castro + Show 2 more

Open Access

https://doi.org/10.1109/access.2021.3108222

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 21	License type: CC BY-NC-ND 4.0

Affiliation: University of Leon

Abstract

Botnets are one of the online threats with the most significant presence, causing billionaire losses to global economies. Nowadays, the increasing number of devices connected to the Internet makes it necessary to analyze extensive network traffic data. In this work, we focus on increasing the performance of botnet traffic classification by selecting those features that further increase the detection rate. For this purpose, we use two feature selection techniques, i.e., Information Gain and Gini Importance, which led to three pre-selected subsets of five, six and seven features. Then, we evaluate the three feature subsets and three models, i.e., Decision Tree, Random Forest and k-Nearest Neighbors. To test the performance of the three feature vectors and the three models, we generate two datasets based on the CTU-13 dataset, namely QB-CTU13 and EQB-CTU13. Finally, we measure the performance as the macro averaged F1 score over the computational time required to classify a sample. The results show that the highest performance is achieved by Decision Trees using a five feature set, which obtained a mean F1 score of 85% classifying each sample in an average time of 0.78 microseconds.

Full Text