A semantic element representation model for malicious domain name detection

Luhui Yang,Guangjie Liu,Jinwei Wang,Jiangtao Zhai,Yuewei Dai

doi:10.1016/j.jisa.2022.103148

Abstract

The existing detection methods of algorithmically generated malicious domain names lack theoretical modelling methods for domain name element composition. To address this problem, a semantic element representation model for domain names is constructed based on the set of semantic elements of domain names and the probabilistic context free grammar model. The model first analyses and categorises the constituent elements of the domain name, and then proposes a syntax tree analysis method for the semantical relationships between the elements, which enables efficient representation of multiple elements in domain names. Based on the proposed model, the malicious domain names are categorised into four categories: random character-based, word-based, predicted character-based, and multi-element hybrid. Experiments are conducted to analyse the anomalies and concealment of domain names, the results denote that there are significant differences between malicious and legitimate domain names, as well as between malicious domain names, and the comparative experimental results denote the proposed model can effectively improve the detection accuracy of malicious domain names.

Full Text