Abstract

With regard to future service robots, unsafe exceptional circumstances can occur in complex systems that are hardly to foresee. In this paper, the assumption of having no knowledge about the environment is investigated using reinforcement learning as an option for learning behavior by trial-and-error. In such a scenario, action-selection decisions are made based on future reward predictions for minimizing costs in reaching a goal. It is shown that the selection of safety-critical actions leading to highly negative costs from the environment is directly related to the exploration/exploitation dilemma in temporal-difference learning. For this, several exploration policies are investigated with regard to worst- and best-case performance in a dynamic environment. Our results show that in contrast to established exploration policies like ε-Greedy and Softmax, the recently proposed VDBE-Softmax policy seems to be more appropriate for such applications due to its robustness of the exploration parameter for unexpected situations.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.