Abstract

There are many proposed policy-improving systems of Reinforcement Learning (RL) agents which are effective in quickly adapting to environmental change by using many statistical methods, such as mixture model of Bayesian Networks, Mixture Probability and Clustering Distribution, etc. However such methods give rise to the increase of the computational complexity. For another method, the adaptation performance to more complex environments such as multi-layer environments is required. In this study, we used profit-sharing method for the agent to learn its policy, and added a mixture probability into the RL system to recognize changes in the environment and appropriately improve the agent’s policy to adjust to the changing environment. We also introduced a clustering that enables a smaller, suitable selection in order to reduce the computational complexity and simultaneously maintain the system’s performance. The results of experiments presented that the agent successfully learned the policy and efficiently adjusted to the changing in multi-layer environment. Finally, the computational complexity and the decline in effectiveness of the policy improvement were controlled by using our proposed system.

Highlights

  • Along with the increasing need for rescue robots in disasters such as earthquakes and tsunami, there is an urgentHow to cite this paper: Phommasak, U., Kitakoshi, D., Shioya, H. and Maeda, J. (2014) A Reinforcement Learning System to Dynamic Movement and Multi-Layer Environments

  • The purpose of the experiment is to learn the policy in unknown dynamic environments EA, EB and EC in three cases, by employing only the profit-sharing method and the mixture probability scheme; the evaluation is based on the success rate of 2000 trials

  • The success rate of policy improvement in EA, EB and EC by using only profit-sharing method and using mixture probabilities and clustering is shown in Figure 6, and the processing time from Step 3 until experiment finish in cases using all 50 elements and using only 35, 25 and 15 elements is shown in and Table 4, respectively

Read more

Summary

Introduction

Along with the increasing need for rescue robots in disasters such as earthquakes and tsunami, there is an urgent. Effective adjustment to an unknown environment becomes possible by using statistical methods, such as a Bayesian network model [5] [6], mixture probability and clustering distribution [7] [8], etc., which consist of observational data on multiple environments that the agents have learned in the past [9] [10]. By using mixture probability and clustering distribution, even though the computational complexity was controlled and the system’s performance was simultaneously maintained, the experiments were only conducted on fixed obstacle 2D-environments. We describe modifications of profit-sharing method with new parameters that make it possible to work on dynamic movement of multi-layer environments.

Profit-Sharing
Mixture Probability
Clustering Distributions
Flow System
Experiments
Experimental Setup
Discussion
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.