Abstract

With the increase of machine learning usage by industries and scientific communities in a variety of tasks such as text mining, image recognition and self-driving cars, automatic setting of hyper-parameter in learning algorithms is a key factor for obtaining good performances regardless of user expertise in the inner workings of the techniques and methodologies. In particular, for a reinforcement learning algorithm, the efficiency of an agent learning a control policy in an uncertain environment is heavily dependent on the hyper-parameters used to balance exploration with exploitation. In this work, an autonomous learning framework that integrates Bayesian optimization with Gaussian process regression to optimize the hyper-parameters of a reinforcement learning algorithm, is proposed. Also, a bandits-based approach to achieve a balance between computational costs and decreasing uncertainty about the \textit{Q}-values, is presented. A gridworld example is used to highlight how hyper-parameter configurations of a learning algorithm (SARSA) are iteratively improved based on two performance functions.

Highlights

  • In recent years, with the notable increase of computational power in terms of floating point operations, a vast number of different applications of machine learning algorithms have been attempted, yet optimization of hyper-parameters is needed in order to obtain higher levels of performance

  • For the most commonly studied tasks that can be solved by using reinforcement learning (RL), where the agent may take a long sequence of actions receiving insignificant or no reward until arriving to a state for which a high reward is received, the fact of having a delayed reward signal means larger execution times, so optimizing hyper-parameters using optimization strategies like Grid Search, Random Search, Monte Carlo or Gradientbased methods is not suitable for efficient autonomous learning

  • As a method that can be combined with a large set of algorithms such as deep learning representations, as presented in [8], and employed in applications such as self-driving cars [25] and in games such as Go [26, 27], reinforcement learning (RL) is a form of computational learning where an agent is immersed in an environment E and its goal is to converge to an optimal policy by performing sequences of actions in different environmental states and receiving reinforcement signals after each action is taken

Read more

Summary

Introduction

With the notable increase of computational power in terms of floating point operations, a vast number of different applications of machine learning algorithms have been attempted, yet optimization of hyper-parameters is needed in order to obtain higher levels of performance. This is followed by a section devoted to the proposed framework of autonomous reinforcement learning, Section 3, RLOpt, which aims to implement fast convergence to nearoptimal agent policies while minimizing the number of queries in the objective function at the meta-learning level.

Related Work
Reinforcement Learning
Bayesian Optimization
Gaussian Processes
Meta-learning
Bandit algorithms
RLOpt Framework
Experimental Setup
Comparison with random search
Variants with N -Armed Bandits Decision Algorithms
Findings
Concluding Remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call