Abstract
In this paper a new framework, called Compressive Kernelized Reinforcement Learning (CKRL), for computing near-optimal policies in sequential decision making with uncertainty is proposed via incorporating the non-adaptive data-independent Random Projections and nonparametric Kernelized Least-squares Policy Iteration (KLSPI). Random Projections are a fast, non-adaptive dimensionality reduction framework in which high-dimensionality data is projected onto a random lower-dimension subspace via spherically random rotation and coordination sampling. KLSPI introduce kernel trick into the LSPI framework for Reinforcement Learning, often achieving faster convergence and providing automatic feature selection via various kernel sparsification approaches. In this approach, policies are computed in a low-dimensional subspace generated by projecting the high-dimensional features onto a set of random basis. We first show how Random Projections constitute an efficient sparsification technique and how our method often converges faster than regular LSPI, while at lower computational costs. Theoretical foundation underlying this approach is a fast approximation of Singular Value Decomposition (SVD). Finally, simulation results are exhibited on benchmark MDP domains, which confirm gains both in computation time and in performance in large feature spaces.
Highlights
This paper explores a technique called compressive reinforcement learning, analogous to recent work on compressed sensing, wherein approximation spaces are constructed by measurements representing random correlations with value functions
A new framework, called Compressive Kernelized Reinforcement Learning (CKRL), for computing near-optimal policies in sequential decision making with uncertainty is proposed via incorporating the non-adaptive data-independent Random Projections and nonparametric Kernelized
One of the central ideas is that Random Projections are able to constitute an efficient sparsification technique and this brings about both faster convergence rate and lower computational costs than regular Least-squares policy iteration (LSPI)
Summary
This paper explores a technique called compressive reinforcement learning, analogous to recent work on compressed sensing, wherein approximation spaces are constructed by measurements representing random correlations with value functions. A new framework, called Compressive Kernelized Reinforcement Learning (CKRL), for computing near-optimal policies in sequential decision making with uncertainty is proposed via incorporating the non-adaptive data-independent Random Projections and nonparametric Kernelized. One of the central ideas is that Random Projections are able to constitute an efficient sparsification technique and this brings about both faster convergence rate and lower computational costs than regular LSPI. Theoretical foundation underlying this approach is a fast approximation of Singular Value Decomposition (SVD).
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have