Abstract

CSE-CIC-IDS2018 is an intrusion detection dataset containing roughly 16,000,000 normal and anomalous instances, with about 17% of these instances representing attack traffic. Our big data study has two parts, ensemble feature selection and model comparison. In the first part, we select features from the dataset for input to two classifiers that we employ in the second part. In the second part, we evaluate the performance of the classifiers with Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) and Fl-score. The outcome of our experiments enables us to answer three research questions. The first question is, “Does feature selection impact performance of classifiers in terms of AUC and Fl-score?” The second question is, “Does including the Destination_Port categorical feature significantly impact performance of LightGBM in terms of AUC and Fl-score?” And, our third question is, “Does the choice of classifier: LightGBM or XGBoost, significantly impact performance in terms of AUC and Fl-score?” For CSE-CIC-IDS2018, we conclude that feature selection and classifier choice impact performance score, and Destination_Port is a significant feature for LightGBM. In our case study, we present the application and analysis of the impact of an ensemble feature selection technique. To the best of our knowledge, we are the first to apply this technique to the CSE-CIC-IDS2018 dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.