Abstract

Distributed Denial of Service (DDoS) attacks impose a major challenge for today's security systems, given the variety of implementations and the scale they can achieve. One approach for their early detection is the use of Machine Learning (ML) techniques, which create rules for classifying traffic from historical data. However, different types of data contribute unequally to the assertiveness of the trained model. The use of Feature Selection (FS) techniques as a pre-processing step allows identification of the most relevant features for the problem in question. This action reduces training time and can even improve performance when noisy variables are eliminated. The current work is based on a public dataset and the XGBoost algorithm to measure the impact of FS techniques on the DDoS attack classification problem. We consider both techniques independent of the sample labels, as well as methods that use this information to rank the variables in order of importance. We analyzed the problem from the point of view of Binary and Multiclass classification. We also created a benchmark of classification metrics and execution times. Our comparisons involved the Accuracy, Precision, Recall, and F1 Score metrics for different FS methods, in addition to training and execution time. In the results it is possible to verify both for the Binary (30% reduction of the features) and Multiclass classifiers (40% reduction of the features), that the ANOVA method showed as the most advantageous.

Highlights

  • D ISTRIBUTED Denial of Service (DDoS) attacks are increasingly frequent and voluminous on the Internet

  • Attack classified as Attack Benign classified as Attack Attack classified as Benign Benign classified as Benign

  • The CICDDoS2019 dataset was used as a basis for verifying the impact of Feature Selection (FS) on the DDoS attack classification quality metrics

Read more

Summary

Introduction

D ISTRIBUTED Denial of Service (DDoS) attacks are increasingly frequent and voluminous on the Internet. Thousands of attacks are triggered to the most diverse targets: governments, e-commerce companies, telecommunications service providers, multimedia content distributors, among others [1]. The motivations for these attacks are very diverse, such as economic interests, political activism, or even intellectual curiosity. The FS algorithms are inserted in the context of dimensionality reduction The objective of these techniques is to find a subset of input features, so that they are closer to the target feature and more distant from each other [12]. An FS technique is characterized by the choice of a subset of features, among the original variables, without any transformation or creation of new variables. An unwanted consequence of methods that transform the original variables is the lack of interpretability of the new variables

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.