Abstract

AbstractReinforcement learning is a machine learning method that relies on the agent to learn by trial and error to solve decision optimization problems. It is well known that an agent based on deep reinforcement learning in complex environments is difficult to train. Moreover, the agent will generate unsafe and strange actions due to the lack of sufficient reward feedback from the environment. To make the agent converge to a better policy and make its behavior safer and more controllable under sparse rewards, we propose a subgoal embedding method based on prior knowledge and hierarchical strategy that can make the training process converge faster. The subgoal embedding method can be combined with existing reinforcement learning methods. In this paper, we combine the subgoal embedding method with REINFORCE algorithm and PPO(Proximal Policy Optimization) algorithm to test the method in the MiniGrid-DoorKey game environment of the gym platform. The experiments demonstrate the effectiveness of the subgoal embedding method.KeywordsReinforcement learningDeep reinforcement learningSubgoal embeddingSparse rewardHierarchical strategiesSafe agent

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.