Abstract

ABSTRACT To increase the adaptivity of hierarchical reinforcement learning (HRL) and accelerate the learning process in environments with multiple sources of reward, we propose an emotion-based HRL algorithm inspired by neurobiology. In the algorithm, each reward source defines a subtask and each subtask is assigned an artificial emotion indication (AEI) that predicts the reward component associated with the subtask. The AEIs are simultaneously learned along with the top-level policy and used to interrupt subtask execution when the AEIs change significantly. The algorithm is tested in a simulated gridworld which has two sources of reward and is partially observable. Experimental results show that the inclusion of an artificial emotion mechanism that adaptively terminates subtasks makes reuse of the subtask policies efficient for multigoal environments. The use of the artificial emotion variables significantly accelerates the learning process by 60% and achieves higher long-term reward compared to a human-designed policy and a restricted form of the MAXQ algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call