Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task

Ken Kinjo,Eiji Uchibe,Kenji Doya

doi:10.3389/fnbot.2013.00007

Ken Kinjo, Eiji Uchibe + Show 1 more

Open Access

https://doi.org/10.3389/fnbot.2013.00007

Copy DOI

Abstract

Linearly solvable Markov Decision Process (LMDP) is a class of optimal control problem in which the Bellman's equation can be converted into a linear equation by an exponential transformation of the state value function (Todorov, 2009b). In an LMDP, the optimal value function and the corresponding control policy are obtained by solving an eigenvalue problem in a discrete state space or an eigenfunction problem in a continuous state using the knowledge of the system dynamics and the action, state, and terminal cost functions. In this study, we evaluate the effectiveness of the LMDP framework in real robot control, in which the dynamics of the body and the environment have to be learned from experience. We first perform a simulation study of a pole swing-up task to evaluate the effect of the accuracy of the learned dynamics model on the derived the action policy. The result shows that a crude linear approximation of the non-linear dynamics can still allow solution of the task, despite with a higher total cost. We then perform real robot experiments of a battery-catching task using our Spring Dog mobile robot platform. The state is given by the position and the size of a battery in its camera view and two neck joint angles. The action is the velocities of two wheels, while the neck joints were controlled by a visual servo controller. We test linear and bilinear dynamic models in tasks with quadratic and Guassian state cost functions. In the quadratic cost task, the LMDP controller derived from a learned linear dynamics model performed equivalently with the optimal linear quadratic regulator (LQR). In the non-quadratic task, the LMDP controller with a linear dynamics model showed the best performance. The results demonstrate the usefulness of the LMDP framework in real robot control even when simple linear models are used for dynamics learning.

Highlights

When we want to design an autonomous robot that can act optimally in its environment, the robot should solve non-linear optimization problems in continuous state and action spaces
DISCUSSION it has been reported that the framework of Linearly solvable Markov Decision Process (LMDP) can find an optimal policy faster than conventional reinforcement learning algorithms, the LMDP requires the knowledge of state transition probabilities in advance
We demonstrated that the LMDP framework can be successfully used with the environmental dynamics estimated by model learning

Summary

Introduction

When we want to design an autonomous robot that can act optimally in its environment, the robot should solve non-linear optimization problems in continuous state and action spaces. If a precise model of the environment is available, both optimal control (Todorov, 2006) and model-based reinforcement learning (Barto and Sutton, 1998) give a computational framework to find an optimal control policy which minimizes cumulative costs (or maximizes cumulative rewards). The nonlinear Hamilton-Jacobi-Bellman (HJB) equation must be solved in order to derive an optimal policy. Dynamic programming solves the Bellman equation, which is a discrete-time version of the HJB equation, for discrete states and actions problems

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Neurorobotics	Publication Date: Jan 1, 2013
Citations: 41	License type: cc-by

R Discovery Prime

R Discovery Prime

Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Neurorobotics

Lead the way for us

Similar Papers

Dynamic models and control system design for heated channels in a Canadian SCWR
Huirui Han ... Jin Jiang
Annals of Nuclear Energy | VOL. 151
Huirui Han, et. al.Huirui Han ... Jin Jiang
07 Nov 2020
Annals of Nuclear Energy | VOL. 151

전차륜조향 굴절차량의 안내제어를 위한 횡방향 동역학 모델
Kyoung-Han Yun ... Young-Chol Kim
The Transactions of The Korean Institute of Electrical Engineers | VOL. 60
Kyoung-Han Yun, et. al.Kyoung-Han Yun ... Young-Chol Kim
01 Jun 2011
The Transactions of The Korean Institute of Electrical Engineers | VOL. 60

Bayesian Dynamic Linear Regression Analysis of Infant Growth by Weight
Dereje Danbe Debeko
American Journal of Theoretical and Applied Statistics | VOL. 7
Dereje Danbe DebekoDereje Danbe Debeko
01 Jan 2018
American Journal of Theoretical and Applied Statistics | VOL. 7

Realizable versus non-realizable dynamic subgrid-scale stress models
Stefan Heinz ... Harish Gopalan
Physics of Fluids | VOL. 24
Stefan Heinz, et. al.Stefan Heinz ... Harish Gopalan
01 Nov 2012
Physics of Fluids | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Neurorobotics