Abstract

Deep reinforcement learning (DRL) has achieved remarkable results on high-dimension state tasks. However, it suffers in hard convergence and low sample efficiency when solving large discrete action space problems. To meet these challenges, we develop a cooperative modular reinforcement learning (CMRL) method to distributedly solve the problems with a large discrete action space. A general yet effective task decomposition method is proposed to decompose the complex decision task in a large action space into multiple decision sub-tasks in small action subsets, using a rule-based action division method. The CMRL method consisting of multiple Critic networks is proposed to settle the multiple sub-tasks, where each Critic network learns a decomposed value function to obtain the local optimal action in a sub-task. The global optimal action is cooperatively chosen by all local optimal actions. Moreover, we propose a new parallel training mechanism, which trains multiple Critic networks with different models and multi-data in parallel. Mathematical properties are proposed to analyze the rationality and superiority of CMRL. Four different simulation experiments are conducted to verify the generality and effectiveness of CMRL for large action space problems. The results show that CMRL has superior performance on training efficiency compared with classical and latest DRL methods while maintaining the accuracy of the solution.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call