A Markov chain Monte Carlo algorithm for Bayesian policy search

Vahid Tavakol Aghaei,Ahmet Onat,Sinan Yıldırım

doi:10.1080/21642583.2018.1528483

Vahid Tavakol Aghaei, Ahmet Onat + Show 1 more

Open Access

https://doi.org/10.1080/21642583.2018.1528483

Copy DOI

Journal: Systems Science & Control Engineering	Publication Date: Jan 1, 2018
Citations: 7	License type: open-access

Affiliation: Sabancı University

Abstract

ABSTRACTPolicy search algorithms have facilitated application of Reinforcement Learning (RL) to dynamic systems, such as control of robots. Many policy search algorithms are based on the policy gradient, and thus may suffer from slow convergence or local optima complications. In this paper, we take a Bayesian approach to policy search under RL paradigm, for the problem of controlling a discrete time Markov decision process with continuous state and action spaces and with a multiplicative reward structure. For this purpose, we assume a prior over policy parameters and aim for the ‘posterior’ distribution where the ‘likelihood’ is the expected reward. We propound a Markov chain Monte Carlo algorithm as a method of generating samples for policy parameters from this posterior. The proposed algorithm is compared with certain well-known policy gradient-based RL methods and exhibits more appropriate performance in terms of time response and convergence rate, when applied to a nonlinear model of a Cart-Pole benchmark.

Highlights

Reinforcement learning (RL) can be considered as a topic of interest among the researchers in the field of machine learning, control and robotics; see Sutton and Barto (1998) for an introduction
We considered the control problem of an Markov decision processes (MDP) with continuous state and action spaces in finite time horizon and presented a novel policy search method for a class of parameterized polices in RL under a Bayesian learning framework
The proposed learning approach is an MCMC algorithm that uses estimates of the expected multiplicative total reward which are attained with an sequential Monte Carlo (SMC) algorithm

Summary

A Markov chain Monte Carlo algorithm for Bayesian policy search

SYSTEMS SCIENCE & CONTROL ENGINEERING: AN OPEN ACCESS JOURNAL 2018, VOL. Vahid Tavakol Aghaeia, Ahmet Onata and Sinan Yıldırımb aFaculty of Engineering and Natural Sciences, Mechatronics Engineering, Sabancı University, Istanbul, Turkey; bFaculty of Engineering and Natural Sciences, Industrial Engineering, Sabancı University, Istanbul, Turkey

Introduction

Markov decision processes

Reward-based policy search

Gradient-based algorithms for policy optimization

A Bayesian framework for RL

MCMC for policy search

The inverted pendulum model

Simulations

Numerical experiments

Findings

Conclusion and future work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Markov chain Monte Carlo algorithm for Bayesian policy search

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Systems Science & Control Engineering

Lead the way for us

Similar Papers

Simultaneous perturbation algorithms for batch off-policy search
Raphael Fonteneau ... L A Prashanth
-
Raphael Fonteneau, et. al.Raphael Fonteneau ... L A Prashanth
01 Dec 2014
01 Dec 2014

Reinforcement Learning of Potential Fields to achieve Limit-Cycle Walking
Denise S Feirstein ... Heike Vallery
IFAC PapersOnLine | VOL. 49
Denise S Feirstein, et. al.Denise S Feirstein ... Heike Vallery
01 Jan 2015
IFAC PapersOnLine | VOL. 49

On the sample complexity of actor-critic method for reinforcement learning with function approximation
Harshat Kumar ... Alec Koppel
Machine Learning | VOL. 112
Harshat Kumar, et. al.Harshat Kumar ... Alec Koppel
16 Feb 2023
Machine Learning | VOL. 112

\title{Connectionist model of action learning and naming}
Farkas Igor
Frontiers in Computational Neuroscience | VOL. 5
Farkas IgorFarkas Igor
01 Jan 2010
Frontiers in Computational Neuroscience | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Markov chain Monte Carlo algorithm for Bayesian policy search

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Systems Science & Control Engineering