Abstract

In recent years, deep reinforcement learning has combined the advantages of reinforcement learning and deep learning, and has made great progress in decision-making tasks. However, the training of deep reinforcement learning requires frequent interactions between the agent and the environment and repeated experiments. Adversaries have chances to poison the sample data collected by the agent by attacking the experimental environment in the training process, thereby bringing security risks and serious consequences to the training process of reinforcement learning. This work is committed to addressing the security risks in the field of deep reinforcement learning. However, this work improves the algorithm from the perspective of sample data filtering, and improves the security performance of deep reinforcement learning algorithm. There are two contributions in this work: one is to defend against adversarial attacks against deep reinforcement learning through cluster analysis and sample value evaluation; the other is to propose a deep reinforcement learning algorithm based on sample value evaluation on the basis of deterministic strategy gradient algorithm. The algorithm uses the clustering method to classify the sample pool, and measures the contribution value and security risk of the sample to the model training through the sample value evaluation. The classic game experiments show that the proposed algorithm is safe and effective. It reduces the threat of the agent falling into the adversarial sample attack and improves the training performance of deep reinforcement learning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call