Abstract

Highly interactive honeypots can form reliable connections by responding to attackers to delay and capture intranet attacks. However, current research focuses on modeling the attacker as part of the environment and defining single-step attack actions by simulation to study the interaction of honeypots. It ignores the iterative nature of the attack and defense game, which is inconsistent with the correlative and sequential nature of actions in real attacks. These limitations lead to insufficient interaction of the honeypot response strategies generated by the study, making it difficult to support effective and continuous games with attack behaviors. In this paper, we propose an autonomous attack response framework (named AARF) to enhance interaction based on multi-agent dynamic games. AARF consists of three parts: a virtual honeynet environment, attack agents, and defense agents. Attack agents are modeled to generate multi-step attack chains based on a Hidden Markov Model (HMM) combined with the generic threat framework ATT&CK (Adversarial Tactics, Techniques, and Common Knowledge). The defense agents iteratively interact with the attack behavior chain based on reinforcement learning (RL) to learn to generate honeypot optimal response strategies. Aiming at the sample utilization inefficiency problem of random uniform sampling widely used in RL, we propose the dynamic value label sampling (DVLS) method in the dynamic environment. DVLS can effectively improve the sample utilization during the experience replay phase and thus improve the learning efficiency of honeypot agents under the RL framework. We further couple it with a classic DQN to replace the traditional random uniform sampling method. Based on AARF, we instantiate different functional honeypot models for deception in intranet scenarios. In the simulation environment, honeypots collaboratively respond to multi-step intranet attack chains to defend against these attacks, which demonstrates the effectiveness of AARF. The average cumulative reward of the DQN with DVLS is beyond eight percent, and the convergence speed is improved by five percent compared to a classic DQN.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call