Abstract

Reinforcement learning (RL) methods have been increasingly popular in sequential decision making tasks due to its empirical success. However, large state and action spaces in real-world problems modeled as a Markov decision processes (MDPs) limit the use of RL algorithms. Given a finite-horizon MDP with state space S, action space A, and horizon H, one needs Ω |S|A|H3/ε2 samples given a generative model to learn an optimal policy [3], which can be impractical when S and A are large. The above tabular RL framework does not capture the fact that many realworld systems in fact have additional structure that if exploited should improve computational and statistical efficiency. Moreover, [1] empirically verifies that optimal and near-optimal action-value functions (both viewed as |S|-by- |A| matrices) of classical stochastic control tasks have low rank. Thus, the critical question is what are the minimal low rank structural assumptions that allow for computationally and statistically efficient learning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call