Model-Based Reinforcement Learning

Soumya Ray,Prasad Tadepalli

doi:10.1007/978-1-4899-7502-7_561-1

Abstract

Reinforcement Learning (RL) refers to learning to behave optimally in a stochastic environment by taking actions and receiving rewards (Sutton and Barto 1998). The environment is assumed Markovian in that there is a fixed probability of the next state given the current state and the agent’s action. The agent also receives an immediate reward based on the current state and the action. Models of the next-state distribution and the immediate rewards are referred to as “action models” and, in general, are not known to the learner. The agent’s goal is to take actions, observe the outcomes including rewards and next states, and learn a policy or a mapping from states to actions that optimizes some performance measure. Typically the performance measure is the expected total reward in episodic domains and the expected average reward per time step or expected discounted total reward in infinite-horizon domains. The theory of Markov Decision Processes (MDPs) implies that under fairly general conditions, there is a stationary policy, i.e., a time-invariant mapping from states to actions, which maximizes each of the above reward measures. Moreover, there are MDP solution algorithms, e.g., value iteration and policy iteration (Puterman 1994), which can be used to solve the MDP exactly given the action models. Assuming that the number of states is not exceedingly high, this suggests a straightforward approach for model-based reinforcement learning. The models can be learned by interacting with the environment by taking actions, observing the resulting states and rewards, and estimating the parameters of the action models through maximum likelihood methods. Once the models are estimated to a desired accuracy, the MDP solution algorithms can be run to learn the optimal policy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Model-Based Reinforcement Learning

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Model-Based Reinforcement Learning
...
-
, et. al. ...
07 Feb 2012
07 Feb 2012

Contraction Mappings in the Theory Underlying Dynamic Programming
Eric V Denardo
SIAM Review | VOL. 9
Eric V DenardoEric V Denardo
01 Apr 1967
SIAM Review | VOL. 9

Comparative effectiveness research on patients with acute ischemic stroke using Markov decision processes
Darong Wu ... Yuanqi Zhao
BMC Medical Research Methodology | VOL. 12
Darong Wu, et. al.Darong Wu ... Yuanqi Zhao
09 Mar 2012
BMC Medical Research Methodology | VOL. 12

Transition-based versus state-based reward functions for MDPs with Value-at-Risk
Shuai Ma ... Jia Yuan Yu
-
Shuai Ma, et. al.Shuai Ma ... Jia Yuan Yu
01 Oct 2017
01 Oct 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Model-Based Reinforcement Learning

Abstract

Talk to us

Similar Papers