Abstract

The key issue that prevents application of Reinforcement Learning (RL) methods in complex control scenarios is lack of convergence to meaningful decision policies (i.e. policies that differ significatively from random decisions), due to the huge state-action spaces to be explored. Providing the agent with initial domain knowledge alleviates this problem. This is known as Conditioned RL (CRL). In high-dimensional continuous state-action space and reward domains, CRL is often the only feasible approach to reach meaningful decision policies. In these kind of systems, RL is carried out by Actor-Critic approaches, and the state-action value functionals are modeled by Value Function Approximations (VFA). CRL methods make use of an existing reference controller, i.e. the teacher controller, which provides the initial domain knowledge to the agent under training. The teacher-controller can be used in two ways to build the VFA of the state-action value and state transition functions which determine the action selection policy: (1) providing the desired output for a supervised learning process, or (2) directly using it to build them. We have carried out experiments to compare CRL methods, and unconditioned Actor-Critic agents in three different control benchmark scenarios. Results show that both agent conditioning approaches result in significant performance improvements. Undertight computational time constraints, CRL approaches were able to learn efficient policies, while the unconditioned agents were not able to find any acceptable policy in the benchmark control scenarios.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.