Abstract
Minimizing the attack surface of a system and introducing diversity into a system are two effective ways to improve system security. However, determining how to include diversity in a system without increasing the attack surface more than necessary is a difficult problem, requiring knowledge about the system characteristics, operating environment, and available permutations that is generally not available prior to system deployment. We propose viewing a system's components, interfaces, and communication channels as a set of states and actions that can be analyzed using a sequential decision making process, and using a multi-objective reinforcement learning algorithm to learn a set of policies that minimize a system's attack surface and execute those policies to obtain configuration diversity while a system is operating. We describe a methodology for designing a system such that its components and behaviors can be translated into a multi-objective Markov Decision Process, demonstrate the use of multi-objective reinforcement learning to learn a set of optimal policies using three different multi-objective reinforcement learning algorithms in the context of an online file sharing application, and show that our multi-objective temporal difference afterstate algorithm outperforms the alternatives for the example problem.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.