Estimation and control using sampling‐based Bayesian reinforcement learning

Patrick Slade,Zachary N Sunberg,Mykel J Kochenderfer

doi:10.1049/iet-cps.2019.0045

Abstract

Real-world autonomous systems operate under uncertainty about both their pose and dynamics. Autonomous control systems must simultaneously perform estimation and control tasks to maintain robustness to changing dynamics or modelling errors. However, information gathering actions often conflict with optimal actions for reaching control objectives, requiring a trade-off between exploration and exploitation. The specific problem setting considered here is for discrete-time non-linear systems, with process noise, input-constraints, and parameter uncertainty. This study frames this problem as a Bayes-adaptive Markov decision process and solves it online using Monte Carlo tree search with an unscented Kalman filter to account for process noise and parameter uncertainty. This method is compared with certainty equivalent model predictive control and a tree search method that approximates the QMDP solution, providing insight into when information gathering is useful. Discrete time simulations characterise performance over a range of process noise and bounds on unknown parameters. An offline optimisation method is used to select the Monte Carlo tree search parameters without hand-tuning. In lieu of recursive feasibility guarantees, a probabilistic bounding heuristic is offered that increases the probability of keeping the state within a desired region.

Highlights

P LANNING for tasks such as localization and manipulation requires an accurate model of the system dynamics [1]–[3]
A partially observable MDP (POMDP) is an extension of an Markov decision process (MDP) where the agent cannot directly observe the true state, but instead receives only stochastic observations which have distributions conditioned on the state [26]
Confidence regions are computed from the given distributions to find the components of the upper bound corresponding to the belief over the hyperstate βb and process noise βw

Summary

INTRODUCTION

P LANNING for tasks such as localization and manipulation requires an accurate model of the system dynamics [1]–[3]. Equations from the unscented transform have been incorporated as MPC constraints in an attempt to improve state estimation [21]–[23] These methods cannot guarantee stability for systems with time-varying parameters following a Gaussian distribution as the possible changes to the system are unbounded and cannot be corrected by an inputconstrained control law [21], [24]. This paper builds upon preliminary work [31] by exploring the performance for various boundaries on parameter values, upgrading from an extended Kalman filter to an unscented Kalman filter, and applying an offline crossentropy method [34] to determine solver parameters without hand tuning It investigates a further approximation, a tree search variant of QMDP (QMDP-TS), which is more computationally efficient, but assumes full observability of the model parameters after the first step.

BACKGROUND

Monte Carlo Tree Search and QMDP Tree Search

Unscented Kalman Filter

Confidence Regions

Cross-Entropy

PROBLEM FORMULATION

APPROACH

Monte Carlo Tree Search and QMDP-TS

Model Predictive Control

Simultaneous Estimation and Control

Probabilistic Bounding Heuristic

Planar Manipulation Model

Implementation Details

Cross-Entropy for Tuning Solver Parameters

Performance

Findings

CONCLUSIONS

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Estimation and control using sampling‐based Bayesian reinforcement learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IET Cyber-Physical Systems: Theory & Applications

Lead the way for us

Journal: IET Cyber-Physical Systems: Theory & Applications	Publication Date: Jan 8, 2020
License type: CC BY 3.0

Similar Papers

Simultaneous active parameter estimation and control using sampling-based Bayesian reinforcement learning
Patrick Slade ... Zachary Sunberg
-
Patrick Slade, et. al.Patrick Slade ... Zachary Sunberg
01 Sep 2017
01 Sep 2017

Monte Carlo Tree Search for Bayesian Reinforcement Learning
Ngo Anh Vien ... Wolfgang Ertel
-
Ngo Anh Vien, et. al.Ngo Anh Vien ... Wolfgang Ertel
01 Dec 2012
01 Dec 2012

RevCuT Tree Search Method in Complex Single-player Game with Continuous Search Space
Hongming Zhang ... Wei Wu
-
Hongming Zhang, et. al.Hongming Zhang ... Wei Wu
01 Jul 2019
01 Jul 2019

An Approximate Bayesian Reinforcement Learning Approach Using Robust Control Policy and Tree Search
Toru Hishinuma ... Kei Senda
Proceedings of the International Conference on Automated Planning and Scheduling | VOL. 28
Toru Hishinuma, et. al.Toru Hishinuma ... Kei Senda
15 Jun 2018
Proceedings of the International Conference on Automated Planning and Scheduling | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Estimation and control using sampling‐based Bayesian reinforcement learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IET Cyber-Physical Systems: Theory & Applications