Given the large action space and state space involved in penetration testing, reinforcement learning is widely applied to enhance testing efficiency. This paper proposes an automatic penetration testing scheme based on hierarchical reinforcement learning to reduce both action space and state space. The scheme includes a network intelligence responsible for specifying the penetration host and a host intelligence designated to perform penetration testing on the selected host. Specifically, within the network intelligence, an action-masking mechanism is adopted to shield unenabled actions, thereby reducing the explorable action space and improving the penetration testing efficiency. Additionally, the host intelligence employs an invalid discrimination mechanism, terminating testing after actions that do not alter system states, thereby preventing sudden increases in the number of neural network training steps for an action. An optimistic estimation mechanism is also introduced to select penetration strategies suited to various hosts, preventing training crashes due to value confusion between different hosts. Ablation experiments demonstrate that the host intelligence can learn different penetration strategies for varying penetration depths without significant fluctuations in training steps, and the network intelligence can coordinate with the host intelligence to perform network penetration steadily. This hierarchical reinforcement learning framework can detect network vulnerabilities more quickly and accurately, significantly reducing the cost of security policy updates.
Read full abstract