Abstract

Minimizing the attack surface of a system and introducing diversity into a system are two effective ways to improve system security. However, determining how to include diversity in a system without increasing the attack surface more than necessary is a difficult problem, requiring knowledge about the system characteristics, operating environment, and available permutations that is generally not available prior to system deployment. We propose viewing a system's components, interfaces, and communication channels as a set of states and actions that can be analyzed using a sequential decision making process, and using a multi-objective reinforcement learning algorithm to learn a set of policies that minimize a system's attack surface and execute those policies to obtain configuration diversity while a system is operating. We describe a methodology for designing a system such that its components and behaviors can be translated into a multi-objective Markov Decision Process, demonstrate the use of multi-objective reinforcement learning to learn a set of optimal policies using three different multi-objective reinforcement learning algorithms in the context of an online file sharing application, and show that our multi-objective temporal difference afterstate algorithm outperforms the alternatives for the example problem.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call