Abstract

Good system performance depends on the correct setting of its configuration parameters. It is observed that such optimal configuration relies on the incoming workload of the system. In this paper, we utilize the Markov decision process (MDP) theory and present a reinforcement learning strategy to discover the complex relationship between the system workload and the corresponding optimal configuration. Considering the limitations of current reinforcement learning algorithms used in system management, we present a different learning architecture to facilitate the configuration tuning task which includes two units: the actor and critic. While the actor realizes a stochastic policy that maps the system state to the corresponding configuration setting, the critic uses a value function to provide the reinforcement feedback to the actor. Both the actor and critic are implemented by multiple layer neural networks, and the error back-propagation algorithm is used to adjust the network weights based on the temporal difference error produced in the learning. Experimental results demonstrate that the proposed learning process can identify the correct configuration tuning rule which in turn improves the system performance significantly.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call