Abstract

Evolutionary game theory is widely applied in network attack and defense. The existing network attack and defense analysis methods based on evolutionary games adopt the bounded rationality hypothesis. However, the existing research ignores that both sides of the game get more information about each other with the deepening of the network attack and defense game, which may cause the attacker to crack a certain type of defense strategy, resulting in an invalid defense strategy. The failure of the defense strategy reduces the accuracy and guidance value of existing methods. To solve the above problem, we propose a reward value learning mechanism (RLM). By analyzing previous game information, RLM automatically incentives or punishes the attack and defense reward values for the next stage, which reduces the probability of defense strategy failure. RLM is introduced into the dynamic network attack and defense process under incomplete information, and a multistage evolutionary game model with a learning mechanism is constructed. Based on the above model, we design the optimal defense strategy selection algorithm. Experimental results demonstrate that the evolutionary game model with RLM has better results in the value of reward and defense success rate than the evolutionary game model without RLM.

Highlights

  • Introduction e rapid development ofIT infrastructures, such as cyber-physical systems and Internet of ings, has brought convenience to individuals and enterprises

  • To address the above problems, we propose a reward value learning mechanism (RLM), a novel method for updating the reward value based on the game information in the previous stage

  • QDSj(k) denotes the expected revenue obtained by the defender choosing the defense strategy DSj in the k-th attack and defense confrontation at the same stage. e Q-learning replicated dynamic equations formulas (15) and (16) are derived from the correlation formulas (2), (3), (15), and (14): x′(t)

Read more

Summary

Game Model Based on RLM-QRD

To solve the problem of invalidation of specific defense strategies in network attack and defense scene, we put forward RLM with incentive and punishment mechanisms. In the first stage of the game, the incentive and punishment factor α formula of reward value is as follows:. Defense payoff matrix DM comprises defense revenue value dij generated by the defender under attack and defense strategy combination (ASi, DSj). Yj(k) denotes the probability that the defender selects the defense strategy DSj in the k-th attack and defense confrontation at the same game stage. QDSj(k) denotes the expected revenue obtained by the defender choosing the defense strategy DSj in the k-th attack and defense confrontation at the same stage. RLM calculates the incentive and punishment factor α according to reward variable RV and the proportion of the number AN of a certain type of attack strategy in the past SN stage. According to α and the defense result R of the last stage, RLM changes the reward value of the corresponding attack and defense strategy to change the attack and defense reward value of the stage

Optimal Defense Strategy Selection Algorithm
Experiment and Analysis
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call