Abstract

A significant body of work on reinforcement learning has been focused on the single-agent tasks where the agent aims to learn a policy that maximizes the cumulative reward in a dynamic environment.1 In the past decades, quite a few single-agent-based reinforcement learning algorithms have been developed in the literature.1 Yet, it is increasingly recognized that the single-agent-based reinforcement learning algorithms may fail to effectively handle large-scale optimization (decision) tasks with joint features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call