Abstract

Identifying malicious domain names in Internet activities has become an effective method to protect Internet users. Previous works have achieved great identification results, but they highly rely on historical Domain Name System (DNS) responses and external intelligence sources. Thus, they may fail to identify unknown domain name without any prior knowledge. In this paper, we propose Glacier, a feature ensemble-based approach to identifying malicious domain names from valid DNS responses. Glacier addresses the aforementioned problem by utilizing two types of features in domain name strings: the linguistical features and the statistical features. (1) Linguistical features are vector representations generated from the character sequences of domain names by a bidirectional long short-term memory (BiLSTM) neural network. It is worthy to notice that we modify the last BiLSTM layer to enhance the expressiveness of the linguistical features. (2) Statistical features are six manually designed statistics that represent the structural information of a domain name. Structural information can hardly be learnt by a BiLSTM neural network directly. Thus, combining statistical features with linguistical features can improve the effectiveness of malicious domain name identification. We evaluate the identification ability of Glacier on a real-world domain name data set. The best metrics of Glacier are an average accuracy of 90.86% and an average F1-score of 84.37%. Our experimental results show that Glacier can accurately identify resolvable malicious domain names without any DNS traffic data or prior knowledge about unknown domain names.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call