Abstract
The use of machine learning techniques for botnet detection has been an active area of research in security field for some years now. Some of the past machine learning-based botnet detection studies used datasets that were generated synthetically. The release of a large and real-life botnet dataset, named CTU-13, allowed researchers to build machine learning-based models from real-world data. In fact, the real-life traces in the dataset makes it more promising for being used for botnet identification studies. The current study proposed the use of a single tree-based learning algorithm in the classification of botnet evidence from sub-sampled portion of three captures in CTU-13 dataset. Random sub-sampling was used to arrive at three different datasets that was used in the study. The first step in the methodology involved experimental analyses on three captures out of thethirteen in the whole dataset. The analyses revealed the basic characteristics of the datasets which further guided the study further. The missing values and categorical data types in the dataset were handled through mixed imputation and feature encoding, respectively. The big data nature of the dataset was handled through random sub-sampling technique with a view to building a botnet detection model that is less computationally intensive. The random sub-sampling technique was used without changing the data distributions in thedataset. The botnet detection models were built by using decision-tree algorithm from the three sub-sampled dataset captures. The performances of the models were evaluated by using accuracy, precision, recall, and f1-score, respectively. In all, the model built with scenario5 capture slightly performed better than the ones built using scenario 6 and scenario 7 captures, respectively.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.