Abstract

Hierarchical reinforcement learning (HRL) is a promising approach for efficiently solving various long-horizon decision-making tasks in the Internet of Things (IoT) domain. However, HRL algorithms are known to rely on expert knowledge to preset an appropriate hierarchical structure for different IoT tasks, which leads to higher trial costs and limits its wider application. In this paper, we propose a new method called DHRL (Dynamic-Level Hierarchical Reinforcement Learning) and it is able to adaptively search for the optimal hierarchical structure while maintaining the generality of framework design. DHRL incorporates an embedded exploration and exploitation mechanism that effectively solves the challenges caused by dependence between different levels and achieves a balance between maximizing benefits and current evaluation accuracy. Nonetheless, the more exploration processes inevitably has a negative impact on the performance. To mitigate this influences, we propose a synchronous training architecture to support DHRL operating in a distributed and parallel manner, in which the adaptive evolutionary method is also introduced to accelerate the convergence. Extensive experimental evaluations are conducted to demonstrate the effectiveness of our theory and method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call