Abstract

The application of Reinforcement Learning (RL) algorithms to learn tasks for robots is often limited by the large dimension of the state space, which may make prohibitive its application on a tabular model. In this paper, we describe LEAP (Learning Entities Adaptive Partitioning), a model-free learning algorithm that uses overlapping partitions which are dynamically modified to learn near-optimal policies with a small number of parameters. Starting from a coarse aggregation of the state space, LEAP generates refined partitions whenever it detects an incoherence between the current action values and the actual rewards from the environment. Since in highly stochastic problems the adaptive process can lead to over-refinement, we introduce a mechanism that prunes the macrostates without affecting the learned policy. Through refinement and pruning, LEAP builds a multi-resolution state representation specialized only where it is actually needed. In the last section, we present some experimental evaluation on a grid world and a complex simulated robotic soccer task.KeywordsState SpaceReinforcement LearnStochastic EnvironmentReinforcement Learn AlgorithmMerge ActionThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.