Abstract

—The importance of network security in today's information era cannot be underestimated. There is a more significant risk of network penetration than ever before due to the rapid increase of network-enabled devices. Machine learning (ML) algorithms have lately sparked a lot of interest in network security because of their rapid growth and noteworthy outcomes in a variety of fields. The network-based intrusion detection system (NIDS) has many potentials to be the final line of defense against intrusions in today's information communication technology (ICT) age, and it's an essential aspect of network security. Intrusion detection datasets are publicly available due to the dynamic nature of attacks. The CSECIC-IDS2018 on AWS dataset, which contains real-life modern network traffic, was used for this research. This paper employed different machine learning techniques to conduct binary classification on this dataset, including Logistic Regression, Random Forest, and Gradient Boosting. I have gone through the dataset in detail, including data cleaning, pre-processing, feature engineering, and feature selection. The features selected after feature engineering are sent for training, and four different classifiers are generated; baseline, logistic regression, random forest, and gradient boosting. Comprehensive comparisons were made among generated models utilizing evaluation metrics. To evaluate the performance of my research, I used evaluation metrics such as recall (weighted average) and precision (weighted average). Gradient boosting surpassed all the measures, indicating superior to other models with precision, recall and f1-score of 0.98. The model was also tested on the test dataset, and it achieved similar values, which proves the model performs well on data that hasn't been seen before. Keywords—intrusion detection system (ids), machine learning, network security, network-based intrusion detection system, logistic regression, random forest, gradient boosting, recall, precision, precision-recall curve.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call