Active Directory (AD) is the default security management system for Windows domain networks. An AD environment can be described as a cyber-attack graph, with nodes representing computers, accounts, etc., and edges indicating existing accesses or known exploits that enable attackers to move from one node to another. This paper explores a Stackelberg game model between one attacker and one defender on an AD attack graph. The attacker’s goal is to maximize their chances of successfully reaching the destination before getting detected. The defender’s aim is to block a constant number of edges to minimize the attacker’s chance of success. The paper shows that the problem is #P-hard and, therefore, intractable to solve exactly. To defend the AD graph from cyber attackers, this paper proposes two defensive approaches. In the first approach, we convert the attacker’s problem to an exponential sized Dynamic Program that is approximated by a Neural Network (NN). Once trained, the NN serves as an efficient fitness function for defender’s Evolutionary Diversity Optimization based defensive policy. The diversity emphasis on the defender’s solution provides a diverse set of training samples, improving the training accuracy of our NN for modeling the attacker. In the second approach, we propose a RL based policy to solve the attacker’s problem and Critic network assisted Evolutionary Diversity Optimization based defensive policy to solve defender’s problem. Experimental results on synthetic AD graphs show that the proposed defensive policies are scalable, highly effective, approximate attacker’s problem accurately, and generate good defensive plans.