Online malicious domain name detection with partial labels for large-scale dependable systems

Yongqian Sun,Kunlin Jian,Liyue Cui,Guifei Jiang,Shenglin Zhang,Yuzhi Zhang,Dan Pei

doi:10.1016/j.jss.2022.111322

Abstract

Detecting malicious non-existent domain names (NXDomains) in a real-time manner is vitally important to the security of large-scale dependable systems. Existing detection methods are trained based on the assumption that the NXDomains, which cannot be recognized by the domain generation algorithm (DGA) archive, are benign. However, new types of malicious NXDomains are continuously generated, and the DGA archive cannot cover all of them, making the NXDomains partially labeled. Additionally, extracting all the features for distinguishing malicious and benign NXDomains is computationally inefficient and inappropriate for online detection in large-scale dependable systems. This work proposes a framework, PUFS, to train an accurate malicious NXDomain detection model according to partial labels and conduct efficient online detection for large-scale dependable systems. PUFS adopts a novel, simple, yet effective three-step strategy to combine PU learning and feature selection. We conduct extensive experiments using real-world data collected from a top-tier global online bank. PUFS achieves 99.19% of F1-Score, and improves the feature extraction efficiency by 1153%, making it suitable for online detection scenarios.

Full Text