Detection of DGA-Generated Domain Names with TF-IDF

Harald Vranken,Hassan Alizadeh

doi:10.3390/electronics11030414

Harald Vranken, Hassan Alizadeh

Open Access

https://doi.org/10.3390/electronics11030414

Copy DOI

Abstract

Botnets often apply domain name generation algorithms (DGAs) to evade detection by generating large numbers of pseudo-random domain names of which only few are registered by cybercriminals. In this paper, we address how DGA-generated domain names can be detected by means of machine learning and deep learning. We first present an extensive literature review on recent prior work in which machine learning and deep learning have been applied for detecting DGA-generated domain names. We observe that a common methodology is still missing, and the use of different datasets causes that experimental results can hardly be compared. We next propose the use of TF-IDF to measure frequencies of the most relevant n-grams in domain names, and use these as features in learning algorithms. We perform experiments with various machine-learning and deep-learning models using TF-IDF features, of which a deep MLP model yields the best results. For comparison, we also apply an LSTM model with embedding layer to convert domain names from a sequence of characters into a vector representation. The performance of our LSTM and MLP models is rather similar, achieving 0.994 and 0.995 AUC, and average F1-scores of 0.907 and 0.891 respectively.

Highlights

Botnets pose a severe threat to the security of systems connected to the Internet and their users
We provide an extensive literature review on recent prior work in which machine learning and deep learning have been applied for detecting domain name generation algorithms (DGAs)-based botnets
We provide experimental results using term frequency (TF)-inverse document frequency (IDF) as features with the most popular algorithms for machine learning

Summary

Introduction

Botnets pose a severe threat to the security of systems connected to the Internet and their users. By updating the malware running on the bots, the botmaster can configure the botnet to perform different types of attacks, such as launching DDoS attacks, sending spam, or stealing credentials. This versatility causes that botnets are considered as the Swiss army knife of cybercriminals. C&C servers and the communication channels between botmaster and bots are critical components of a botnet. Numerous techniques have been applied to provide stealthy botnet operation and to increase resilience against take-down attempts [1]. The IP address can be hardcoded in the bot malware This offers stealthy botnet operation since no DNS lookup is required.

Methods

Results

Discussion

Conclusion