Abstract

Deep reinforcement learning has a major hurdle in terms of data efficiency. We solve this challenge by pretraining an encoder with unlabeled input, which is subsequently finetuned on a tiny quantity of task-specific input. We use a mixture of latent dynamics modelling and unsupervised goal-conditioned RL to encourage learning representations that capture various elements of the underlying MDP. Our approach significantly outperforms previous work combining offline representation pretraining with task-specific finetuning when limited to 100k steps of interaction on Atari games (equivalent to two hours of human experience) and compares favourably with other pretraining methods that require orders of magnitude more data. When paired with larger models and more diverse, task-aligned observational data, our methodology shows great promise, nearing human-level performance and data efficiency on Atari in the best-case scenario.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.