Advanced persistent threats (APTs) are one of today’s major threats to cyber security. Highly determined attackers along with novel and evasive exfiltration techniques mean APT attacks elude most intrusion detection and prevention systems. The result has been significant losses for governments, organizations, and commercial entities. Intriguingly, despite greater efforts to defend against APTs in recent times, frequent upgrades in defense strategies are not leading to increased security and protection. In this paper, we demonstrate this phenomenon in an appropriately designed APT rivalry game that captures the interactions between attackers and defenders. What is shown is that the defender’s strategy adjustments actually leave useful information for the attackers, and thus intelligent and rational attackers can improve themselves by analyzing this information. Hence, a critical part of one’s defense strategy must be finding a suitable time to adjust one’s strategy to ensure attackers learn the least possible information. Another challenge for defenders is determining how to make the best use of one’s resources to achieve a satisfactory defense level. In support of these efforts, we figured out the optimal timings of a player’s strategy adjustment in terms of information leakage, which form a family of Nash equilibria. Moreover, two learning mechanisms are proposed to help defenders find an appropriate defense level and allocate their resources reasonably. One is based on adversarial bandits, and the other is based on deep reinforcement learning. Experimental simulations show the rationales behind the game and the optimality of the equilibria. The results also demonstrate that players indeed have the ability to improve themselves by learning from past experiences, which shows the necessity of specifying optimal strategy adjustment timings when defending against APTs.