Abstract
In agent control issues, the idea of combining reinforcement learning and planning has attracted much attention. Two methods focus on micro and macro action respectively. Their advantages would show together if there is good cooperation between them. An essential for the cooperation is to find an appropriate boundary, assigning different functions to each method. Such a boundary could be represented by parameters in a planning algorithm. In this paper, we create an optimization strategy for planning parameters, through analysis of the connection of reaction and planning; we also create a non-gradient method for accelerating the optimization. The whole algorithm can find a satisfactory setting of planning parameters, making full use of the reaction capability of specific agents.
Highlights
The solution of many continuous decision problem can be described as such a process: agent set out from the initial state, go through a series of intermediate state and reach the goal state
An optimization strategy is created to find a satisfactory setting of the parameters
The experiments show that with appropriate parameter setting, the pattern search can quickly find a satisfactory setting of two planning parameters. This method can be extend to more sophisticated problems where there are more than three planning parameters to optimize without gradient, as long as the optimization strategy is given
Summary
The solution of many continuous decision problem can be described as such a process: agent set out from the initial state, go through a series of intermediate state and reach the goal state. A recent work (SoRB) [3] showed a novel way to handle problems in a complicated scene: agent first samples states as waypoints, connects waypoints to get a planning graph, finds a shortest path in the graph, and reacts along waypoints on the shortest path. They found a powerful tool to incorporate planning techniques into RL: distance estimates obtained from RL. An online adapting algorithm is proposed, which can adjust the planning parameters base on complexity of state place and reaction capability of agent With this algorithm, task will be handled with relatively little computational cost and high success rate
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.