Moving target defense (MTD) is an emerging proactive defense technology, which can reduce the risk of vulnerabilities exploited by attacker. As a crucial component of MTD, route mutation (RM) faces a few fundamental problems defending against sophisticated Distributed-Denial of Service (DDoS) attacks: 1) it is unable to make optimal mutation selection due to insufficient learning in attack behaviors and 2) because network situation is time varying, RM also lacks self-adaptation in mutation parameters. In this article, we propose a context-aware Q-learning algorithm for RM (CQ-RM) that can learn attack strategies to optimize the selection of mutated routes. We first integrate four representative attack strategies into a unified mathematical model and formalize multiple network constraints. Then, taking above network constraints into considerations, we model RM process as a Markov decision process (MDP). To look for the optimal policy of MDP, we develop a context estimation mechanism and further propose the CQ-RM scheme, which can adjust learning rate and mutation period adaptively. Correspondingly, the optimal convergence of CQ-RM is proved theoretically. Finally, extensive experimental results highlight the effectiveness of our method compared to representative solutions.
Read full abstract