A Method with Pre-trained Word Vectors for Detecting Wordlist-based Malicious Domain Names

Shaoqing Lin,Kaizhi Cheng,Shangping Zhong

doi:10.1088/1742-6596/1757/1/012171

Shaoqing Lin, Kaizhi Cheng + Show 1 more

Open Access

https://doi.org/10.1088/1742-6596/1757/1/012171

Copy DOI

Journal: Journal of Physics: Conference Series	Publication Date: Jan 1, 2021
Citations: 1	License type: cc-by

Affiliation: Fuzhou University

Abstract

In recent years, botnets have used the domain generation algorithm to generate dynamic typified malicious domain names to bypass various detection methods. Given the depth detection model of such domain names, domain names are generally processed by filling and transforming them into a fixed-length one-dimensional vector and then classifying them with poor detection performance. Therefore, this study first divides the domain into a word array and converts it into a word vector using pre-trained word vector models, Embeddings from Language Models. The domain is inputted into the TextCNN model for training classification. From approximately 100,000 data sets, a 94.22% accuracy rate and 6.87% FPR value can be obtained from the training. Compared with previous detection models (i.e., LSTM and CNN), more training and testing are needed, but improvements are made in all indicators.

Full Text