Abstract

Botnets have become one of the main threats to cyberspace security currently. More and more bots utilize the domain generation algorithm (DGA) to generate malicious domain names to communicate with Command & Control (C&C) servers. A well-designed DGA can bypass the traditional detection methods such as sinkhole and rule filtering, which raises new challenges to cyberspace security. In the field of machine learning, the n-gram is a semantic model that characterizes the relationship among neighboring morphemes while deep convolutional neural networks have a robust capability in processing information with translation-invariant properties. In this paper, we combined n-gram and a deep convolutional neural network and then proposed a novel n-gram combined character based domain classification (n-CBDC) model. The n-CBDC model runs in an end-to-end way that doesn’t require hand-extracted features or domain name system (DNS) contextual information; it only needs to input the domain name itself and can automatically estimate the probability that the domain name was generated by DGAs. Experiments on real-world data show that the proposed method can effectively detect domain names generated by DGAs with 98.69% average detection rate and 0.9829 average F-measure, and significantly outperformed the state-of-art methods in detecting pronounceable and wordlist-based DGA domain names with more than 93.89% detection rate. Therefore, the proposed detection method is robust and has a wide range of adaptability in detecting various types of domain names generated by DGAs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call