Abstract

Offline meta-reinforcement learning (OMRL) aims to train agents to quickly adapt to new tasks using only pre-collected data. However, existing OMRL methods often involve numerous ineffective training iterations and may experience performance collapse in the later stages of training. We identify the root cause—shallow memorization problem, where agents overspecialize in specific solutions for encountered states, hindering their generalization performance. This issue arises due to the loss of plasticity and the premature fitting of neural networks, which restricts the exploration of the agents. To address this challenge, we propose Simple COntrastive Representation and Reset-Ensemble for OMRL (SCORE), a novel context-based OMRL approach. SCORE introduces an end-to-end contrastive learning framework without negative samples to pre-train a context encoder, enabling more robust task representations. Subsequently, the context encoder is fine-tuned during meta-training. Furthermore, SCORE employs a Reset-Ensemble mechanism that periodically resets and ensembles partial networks to maintain the agents’ continual learning ability and enhance their perception of characteristics across diverse tasks. Extensive experiments demonstrate that our SCORE method effectively avoids premature fitting and exhibits excellent generalization performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.