Abstract

Adaptive Discretization in Reinforcement Learning Performance guarantees for RL algorithms are typically for worst case instances, which are pathological by design and not observed in meaningful applications. Moreover, many domains (such as computer systems and networking applications) have large state-action spaces and require algorithms to execute with low latency. This phenomenon highlights a trifecta of goals for practical RL algorithms: low sample, storage, and computational complexity. In this work, we develop an algorithmic framework for nonparametric RL with data-driven adaptive discretization. Our framework has provably better sample, storage, and computational complexity than uniform discretization or kernel regression methods. Moreover, we highlight how the performance guarantees are min-max optimal with respect to a novel instance-specific complexity measure that captures structure in facility location and newsvendor models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call