Abstract

Botnets often apply domain name generation algorithms (DGAs) to evade detection by generating large numbers of pseudo-random domain names of which only few are registered by cybercriminals. In this paper, we address how DGA-generated domain names can be detected by means of machine learning and deep learning. We first present an extensive literature review on recent prior work in which machine learning and deep learning have been applied for detecting DGA-generated domain names. We observe that a common methodology is still missing, and the use of different datasets causes that experimental results can hardly be compared. We next propose the use of TF-IDF to measure frequencies of the most relevant n-grams in domain names, and use these as features in learning algorithms. We perform experiments with various machine-learning and deep-learning models using TF-IDF features, of which a deep MLP model yields the best results. For comparison, we also apply an LSTM model with embedding layer to convert domain names from a sequence of characters into a vector representation. The performance of our LSTM and MLP models is rather similar, achieving 0.994 and 0.995 AUC, and average F1-scores of 0.907 and 0.891 respectively.

Highlights

  • Botnets pose a severe threat to the security of systems connected to the Internet and their users

  • We provide an extensive literature review on recent prior work in which machine learning and deep learning have been applied for detecting domain name generation algorithms (DGAs)-based botnets

  • We provide experimental results using term frequency (TF)-inverse document frequency (IDF) as features with the most popular algorithms for machine learning

Read more

Summary

Introduction

Botnets pose a severe threat to the security of systems connected to the Internet and their users. By updating the malware running on the bots, the botmaster can configure the botnet to perform different types of attacks, such as launching DDoS attacks, sending spam, or stealing credentials. This versatility causes that botnets are considered as the Swiss army knife of cybercriminals. C&C servers and the communication channels between botmaster and bots are critical components of a botnet. Numerous techniques have been applied to provide stealthy botnet operation and to increase resilience against take-down attempts [1]. The IP address can be hardcoded in the bot malware This offers stealthy botnet operation since no DNS lookup is required.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call