Abstract Botnets currently use domain-generation algorithms to produce fast-flux domains that enable them to evade detection. Accurately categorizing these botnet domains is crucial to develop cybersecurity solutions against botnet threats. However, existing methods, requiring labeled data, are ineffective against new botnets. To address this issue, we propose Domain2Vec, a metric learning-based approach that can explore new botnets. Domain2Vec integrates a framework of metric learning, which uses individual domains from known botnets for categorization of unknown botnet domains. The training involves an attention-based encoder, and it includes a constraint to ensure that samples with the same labels are closer in the embedding space. The categorization uses the encoder to project domain names into appropriate representations (numerical vectors), even for domains from new botnets. Finally, Domain2Vec uses numerical vectors to explore botnets. Experiments showed that Domain2Vec performs well on domain retrieval and clustering tasks without labeled data, outperforming the state of the art by 13% and 100%, respectively. Real-world tests demonstrate that Domain2Vec can effectively identify unreported malicious domains and monitor botnet activities.
Read full abstract