Abstract

With the widespread applications of artificial intelligence and automation, more and more devices are monitored by computer systems. In many cases, multiple management control information systems compose a comprehensive information system network. As the scale of the network is getting larger and larger and the topology of the network is getting more and more sophisticated, it is impossible for a fixed mode network system control policy, which was designed for small and simple network that often lacked ability to deal with dynamic environment, to handle security policy task. Hereby a network security policy online learning algorithm based on Sarsa with the optimistic initial values is proposed. The algorithm consists of two parts, one acting as the defence agent and the other acting as the attacking agent. The defence agent learns and improves the system protection policy by fighting against simulating attacking from attacking agent. Defence agent takes advantage of Sarsa method to improve its defence policy, which utilises historical experience to improve the defence policy in an online mode. The use of optimistic initial values speeds up the training time.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.