Abstract

In agent control issues, the idea of combining reinforcement learning and planning has attracted much attention. Two methods focus on micro and macro action respectively. Their advantages would show together if there is good cooperation between them. An essential for the cooperation is to find an appropriate boundary, assigning different functions to each method. Such a boundary could be represented by parameters in a planning algorithm. In this paper, we create an optimization strategy for planning parameters, through analysis of the connection of reaction and planning; we also create a non-gradient method for accelerating the optimization. The whole algorithm can find a satisfactory setting of planning parameters, making full use of the reaction capability of specific agents.

Highlights

  • The solution of many continuous decision problem can be described as such a process: agent set out from the initial state, go through a series of intermediate state and reach the goal state

  • An optimization strategy is created to find a satisfactory setting of the parameters

  • The experiments show that with appropriate parameter setting, the pattern search can quickly find a satisfactory setting of two planning parameters. This method can be extend to more sophisticated problems where there are more than three planning parameters to optimize without gradient, as long as the optimization strategy is given

Read more

Summary

Introduction

The solution of many continuous decision problem can be described as such a process: agent set out from the initial state, go through a series of intermediate state and reach the goal state. A recent work (SoRB) [3] showed a novel way to handle problems in a complicated scene: agent first samples states as waypoints, connects waypoints to get a planning graph, finds a shortest path in the graph, and reacts along waypoints on the shortest path. They found a powerful tool to incorporate planning techniques into RL: distance estimates obtained from RL. An online adapting algorithm is proposed, which can adjust the planning parameters base on complexity of state place and reaction capability of agent With this algorithm, task will be handled with relatively little computational cost and high success rate

Background
Algorithm
Optimizing Planning Parameters
Pattern Search
Changing Process of Planning Parameters
Comparison of Different Planning Parameters Settings
Discussion and Future
A Introducing Planning
B Deliberate Training
C Problematic Distance Estimates
D Comparison of Different Reaction Capability
E Another Pattern Search Method
F Environment and Hyperparameters
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.