Autonomous Penetration Testing Based on Improved Deep Q-Network

Shicheng Zhou,Yue Zhang,Jingju Liu,Dongdong Hou,Xiaofeng Zhong

doi:10.3390/app11198823

Shicheng Zhou, Yue Zhang + Show 3 more

Open Access

https://doi.org/10.3390/app11198823

Copy DOI

Abstract

Penetration testing is an effective way to test and evaluate cybersecurity by simulating a cyberattack. However, the traditional methods deeply rely on domain expert knowledge, which requires prohibitive labor and time costs. Autonomous penetration testing is a more efficient and intelligent way to solve this problem. In this paper, we model penetration testing as a Markov decision process problem and use reinforcement learning technology for autonomous penetration testing in large scale networks. We propose an improved deep Q-network (DQN) named NDSPI-DQN to address the sparse reward problem and large action space problem in large-scale scenarios. First, we reasonably integrate five extensions to DQN, including noisy nets, soft Q-learning, dueling architectures, prioritized experience replay, and intrinsic curiosity model to improve the exploration efficiency. Second, we decouple the action and split the estimators of the neural network to calculate two elements of action separately, so as to decrease the action space. Finally, the performance of algorithms is investigated in a range of scenarios. The experiment results demonstrate that our methods have better convergence and scaling performance.

Highlights

Penetration testing is active and authorized simulated cyberattack, aiming at assessing cybersecurity and discovering the hidden vulnerabilities.Currently, pentesting plays a crucial role in strengthening the defense of computer systems against cyberattacks, as digital assets are more frequently exposed to hackers’ persistent, varied, and complex threats than ever before.the traditional pentesting methods mainly rely on highly skilled cybersecurity experts with domain-specific knowledge and experience, which requires prohibitive labor and time costs
To use reinforcement learning (RL) for autonomous pentesting, we model pentesting as an Markov decision process (MDP) problem that is defined by the tuple < S, A, R, T >
The agent gradually learns to use as few steps as possible to obtain the maximum rewards. Both NDSPI-deep Q-network (DQN) and its decoupling version can converge on the approximate optimal value within limited episodes (∼600 episodes for the decoupling version and ∼1000 for the NDSPI-DQN) while DQN fails to converge during the training process

Summary

Introduction

Penetration testing (short PT or pentesting) is active and authorized simulated cyberattack, aiming at assessing cybersecurity and discovering the hidden vulnerabilities.Currently, pentesting plays a crucial role in strengthening the defense of computer systems against cyberattacks, as digital assets are more frequently exposed to hackers’ persistent, varied, and complex threats than ever before.the traditional pentesting methods mainly rely on highly skilled cybersecurity experts with domain-specific knowledge and experience, which requires prohibitive labor and time costs. Penetration testing (short PT or pentesting) is active and authorized simulated cyberattack, aiming at assessing cybersecurity and discovering the hidden vulnerabilities. Compared to the human-based method, performing pentesting autonomously is a more efficient and intelligent way. It can realize regular security testing without expensive specialists and make the pentesting process accessible to those nonexperts. Attackers have to use scanning tools to further their knowledge of the target. The information they gather covers the operating system (OS), running services, and other vulnerability-relevant information. Based on the information gathered before, they use the payload to exploit the discovered vulnerability in the target system with the aim of gaining control

Methods

Results

Conclusion