Abstract
Scaling end-to-end learning to control robots with vision inputs is a challenging problem in the field of deep reinforcement learning (DRL). While achieving remarkable success in complex sequential tasks, vision-based DRL remains extremely data-inefficient, especially when dealing with high-dimensional pixels inputs. Many recent studies have tried to leverage state representation learning (SRL) to break through such a barrier. Some of them could even help the agent learn from pixels as efficiently as from states. Reproducing existing work, accurately judging the improvements offered by novel methods, and applying these approaches to new tasks are vital for sustaining this progress. However, the demands of these three aspects are seldom straightforward. Without significant criteria and tighter standardization of experimental reporting, it is difficult to determine whether improvements over the previous methods are meaningful. For this reason, we conducted ablation studies on hyperparameters, embedding network architecture, embedded dimension, regularization methods, sample quality and SRL methods to compare and analyze their effects on representation learning and reinforcement learning systematically. Three evaluation metrics are summarized, including five baseline algorithms (including both value-based and policy-based methods) and eight tasks are adopted to avoid the particularity of each experiment setting. We highlight the variability in reported methods and suggest guidelines to make future results in SRL more reproducible and stable based on a wide number of experimental analyses. We aim to spur discussion about how to assure continued progress in the field by minimizing wasted effort stemming from results that are non-reproducible and easily misinterpreted.
Highlights
Deep Reinforcement Learning is an emerging subfield of Reinforcement Learning (RL) that relies on deep neural networks as a function approximator, enabling RL algorithms in complex environments
The performance of learnt latent feature representations is related to the selected deep reinforcement learning (DRL) algorithm
1 regularization technique has a computational benefit since zero-coefficient features can be avoided, it is discovered that 1-norm based state representation learning (SRL) is not necessary to produce positive results as expected
Summary
Deep Reinforcement Learning is an emerging subfield of Reinforcement Learning (RL) that relies on deep neural networks as a function approximator, enabling RL algorithms in complex environments. Unlike in classic reinforcement learning where human-crafted representation is used, vision-based DRL has to learn features directly from raw observations, in addition to policy learning; on the other hand, most RL approaches assume a fully observable state space, i.e., fully observable Markov Decision Processes (MDPs). This assumption is unworkable in real-world robotics due to factors including sensor sensitivity limitations and sensor noise and the lack of knowledge about whether the observation design is complete or not. Vision-based DRL typically suffers from slow learning speeds and frequently requires an excessive amount of training time and data to attain desired performance, making it unsuitable to real-world situations where data collection is difficlut and expensive
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.