Abstract

When analyzing cybersecurity datasets with machine learning, researchers commonly need to consider whether or not to include Destination Port as an input feature. We assess the impact of Destination Port as a predictive feature by building predictive models with three different input feature sets and four combinations of web attacks from the CSE-CIC-IDS2018 dataset. First, we use Destination Port as the only (single) input feature to our models. Second, all features (from CSE-CIC-IDS2018) are used without Destination Port to build the models. Third, all features plus (including) Destination Port are used to train and test the models. All three of these feature sets obtain respectable classification results in detecting web attacks with LightGBM and CatBoost classifiers in terms of Area Under the Receiver Operating Characteristic Curve (AUC) scores, with AUC scores exceeding 0.90 for all scenarios. We observe the best classification performance scores when Destination Port is combined with all of the other CSE-CIC-IDS2018 features. Although, classification performance is still respectable when only using Destination Port as the only (single) input feature. Additionally, we validate that Botnet attacks also have respectable AUC with Destination Port as the only input feature to our models. This highlights that practitioners must be mindful of whether or not to include Destination Port as an input feature if it experiences lopsided label distributions as we clearly identify in this study. Our brief survey of existing CSE-CIC-IDS2018 literature also discovered that many studies incorrectly treat Destination Port as a numerical input feature with machine learning models. Destination Port should be treated as a categorical input value to machine learning models, as its values do not represent numerical values which can be used in mathematical equations for the models.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.