The domain name system (DNS) serves as a fundamental component of the Internet infrastructure, but it is also exploited by attackers in various cyber-crimes, underscoring the significance of malicious domain detection (MDD). Recent advances show that graph-based models exhibit potential for inferring malicious domains and demonstrate superior performance. However, acquiring large-scale and high-quality graph datasets for MDD proves challenging for individual security institutes. Hence, a promising research direction involves employing vertical federated graph learning scheme to unite diverse security institutes and enhance local datasets resulting in more robust and powerful detection models. Nonetheless, directly applying vertical federated graph neural networks for MDD confronts challenges posed by noisy labels and noisy edges among security institutes, which ultimately diminish detection performance. This paper introduces a novel vertical federated learning framework, called MDD-FedGNN, that applies contrastive learning with two different encoders to deal with noisy labels and employs a new loss function based on the information bottleneck theory to handle noisy edges. Comparative experiments are conducted on a publicly available DNS dataset to evaluate the effectiveness of MDD-FedGNN in addressing the challenges of noisy labels and edges in vertical federated graph learning. The results demonstrate that MDD-FedGNN outperforms baseline methods, confirming the feasibility of training more powerful malicious domain detection models through data sharing and vertical federated learning among different security agencies.
Read full abstract