Abstract

In recent years, botnets have become one of the major threats to information security because they have been constantly evolving in both size and sophistication. A number of botnet detection measures, such as honeynet-based and Intrusion Detection System (IDS)-based, have been proposed. However, IDS-based solutions that use signatures seem to be ineffective because recent botnets are equipped with sophisticated code update and evasion techniques. A number of studies have shown that abnormal botnet detection methods are more effective than signature-based methods because anomaly-based botnet detection methods do not require pre-built botnet signatures and hence they have the capability to detect new or unknown botnets. In this direction, this paper proposes a botnet detection model based on machine learning using Domain Name Service query data and evaluates its effectiveness using popular machine learning techniques. Experimental results show that machine learning algorithms can be used effectively in botnet detection and the random forest algorithm produces the best overall detection accuracy of over 90%.

Highlights

  • In recent years, botnets have been considered one of the major security threats among all types of malware operating on the Internet [1,2]

  • This paper examines and evaluates the effectiveness of the bonnet detection method using Domain Name Service (DNS) query data based on a number of commonly used machine learning techniques, including k-nearest neighbor (kNN), decision trees, random forest and Naïve Bayes

  • We only examine the effectiveness of supervised learning techniques in botnet detection and the subsection briefly describes some of the common supervised machine learning algorithms, including k-nearest neighbor, decision tree, random forest, and Naive Bayes

Read more

Summary

Introduction

Botnets have been considered one of the major security threats among all types of malware operating on the Internet [1,2]. Bots are different from other forms of malware in that they are highly autonomous and are equipped with the ability to use communication channels to receive commands and code updates from their control system. They can notify their working status to their control system periodically. Bots are equipped with the ability to automatically generate C & C server names in accordance with these techniques. Bots can still find IP addresses of C & C servers by generating their hostnames automatically and using these hostnames to query the DNS service.

Related Works
Introduction to Machine Learning
Common Supervised Machine Learning Techniques
Decision tree
Random forest
Naïve Bayes
Experimental Dataset
Data Pre-Processing
Features of Bi-gram and Tri-gram Clusters
Vowel Distribution Features
Experimental Scenarios
Classification Measures
Experimental Results
Comments
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call