Botnet has become one of the serious threats to the Internet ecosystem, and botnet detection is crucial for tracking and mitigating network threats on the Internet. In the evolution of emerging botnets, peer-to-peer (P2P) botnets are more dangerous and resistant because of their distributed characteristics. Among them, unstructured P2P botnets use custom protocols for communication, which can be integrated with legitimate P2P traffic. Moreover, their topological structure is more complex, and a complete topology cannot be obtained easily, making them more concealed and difficult to detect. The bot itself is a kind of overlay network, and research shows that the nodes with shared neighbors usually belong to a certain community. Aiming at unstructured P2P botnets and exploiting complex network theory, from the perspective of shared neighbor nodes, this article proposes a botnet detection framework called Peertrap based on self-avoiding random walks (SAW) community detection under the condition of incomplete topological information. Firstly, network traffic is converted into Netflow, by utilizing Apache Flink big data platform. Also, a P2P traffic cluster feature extraction rule is proposed for distinguishing P2P traffic from non-P2P traffic, and it is formulated by using the upstream and downstream traffic and address distribution threshold features. Then, the confidence between P2P clusters is calculated by the Jaccard coefficient to construct a shared neighbor graph, and the same type of P2P communities are mined by hierarchical clustering using SAW algorithm combined with PCA. Finally, two community attributes, mean address distribution degree and mean closeness degree, are used to distinguish botnets. Experiments are conducted on three unstructured P2P botnets datasets, Sality, Kelihos, and ZeroAccess, and the CTU classic datasets, and then good detection results can be achieved. The framework overcomes one of the most critical P2P botnet detection challenges. It can detect P2P bots with high accuracy in the presence of legitimate P2P traffic, incomplete information network topology, and C&C channel encryption. Our method embodies the typical application of complex network theory in botnet detection field, and it can detect botnets from different families in the network, with good parallelism and scalability.
Read full abstract