Real-Time Detection of Dictionary DGA Network Traffic Using Deep Learning

Kate Highnam,Song Luo,Nicholas R Jennings,Domenic Puzio

doi:10.1007/s42979-021-00507-w

Kate Highnam, Song Luo + Show 2 more

Open Access

https://doi.org/10.1007/s42979-021-00507-w

Copy DOI

Journal: SN Computer Science	Publication Date: Feb 22, 2021
Citations: 34	License type: open-access

Affiliation: Imperial College London, Tencent (China)

Abstract

Botnets and malware continue to avoid detection by static rule engines when using domain generation algorithms (DGAs) for callouts to unique, dynamically generated web addresses. Common DGA detection techniques fail to reliably detect DGA variants that combine random dictionary words to create domain names that closely mirror legitimate domains. To combat this, we created a novel hybrid neural network, Bilbo the “bagging” model, that analyses domains and scores the likelihood they are generated by such algorithms and therefore are potentially malicious. Bilbo is the first parallel usage of a convolutional neural network (CNN) and a long short-term memory (LSTM) network for DGA detection. Our unique architecture is found to be the most consistent in performance in terms of AUC, F_1 score, and accuracy when generalising across different dictionary DGA classification tasks compared to current state-of-the-art deep learning architectures. We validate using reverse-engineered dictionary DGA domains and detail our real-time implementation strategy for scoring real-world network logs within a large enterprise. In 4 h of actual network traffic, the model discovered at least five potential command-and-control networks that commercial vendor tools did not flag.

Highlights

Malware continues to pose a serious threat to individuals and corporations alike [1]
The convolutional neural network (CNN) and long short-term memory (LSTM) are statistically similar in all metrics with the LSTM outperforming the CNN in most precision, true-positive rate (TPR), and false-positive rate (FPR)
We found that suppobox contained the longest substrings, revealing that models which learn the long sequence of suppobox’s dictionary words would have an advantage when classifying the majority of dictionary domain generation algorithms (DGAs) domains

Summary

Introduction

Malware continues to pose a serious threat to individuals and corporations alike [1] Typical attack methods such as viruses, phishing emails, and worms attempt to retrieve private user data, destroy systems, or start unwanted programs. The destination (domain or IP address) of this channel can be hard-coded in the malware itself, making its location discoverable via reverse engineering or straightforward log aggregation techniques Once known, this domain or IP address can be blacklisted, rendering the malware inert. Malware families employ domain generation algorithms (DGAs) to create pseudo-random domains for use in communication These domains are used for short periods of time and phased out for newly-generated domains; this quick turnover means that manual techniques are not effective. For the vast majority of malware samples, traffic related to malicious activity is present in networks weeks or months before the malware is analysed and blacklisted [7]

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Real-Time Detection of Dictionary DGA Network Traffic Using Deep Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: SN Computer Science

Lead the way for us

Similar Papers

Improved DGA Domain Names Detection and Categorization Using Deep Learning Architectures with Classical Machine Learning Algorithms
R Vinayakumar ... Mohamed Elhoseny
-
R Vinayakumar, et. al.R Vinayakumar ... Mohamed Elhoseny
01 Jan 2019
01 Jan 2019

DeepDGA-MINet: Cost-Sensitive Deep Learning Based Framework for Handling Multiclass Imbalanced DGA Detection
R Vinayakumar ... Prabaharan Poornachandran
-
R Vinayakumar, et. al.R Vinayakumar ... Prabaharan Poornachandran
01 Jan 2020
01 Jan 2020

Inline Detection of Domain Generation Algorithms with Context-Sensitive Word Embeddings
Joewie J Koh ... Barton Rhodes
-
Joewie J Koh, et. al.Joewie J Koh ... Barton Rhodes
21 Nov 2018
21 Nov 2018

Far from Classification Algorithm: Dive into the Preprocessing Stage in DGA Detection
Mingkai Tong ... Jiahai Yang
-
Mingkai Tong, et. al.Mingkai Tong ... Jiahai Yang
01 Dec 2020
01 Dec 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Real-Time Detection of Dictionary DGA Network Traffic Using Deep Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: SN Computer Science