Novel Algorithm for Agent Navigation Based on Intrinsic Motivation Due to Boredom

Oscar Loyola,Claudio Urrea,John Kern

doi:10.5755/j01.itc.50.3.29242

Oscar Loyola, Claudio Urrea + Show 1 more

Open Access

https://doi.org/10.5755/j01.itc.50.3.29242

Copy DOI

Abstract

We propose a novel algorithm for the navigation of agents based on reinforcement learning, using boredomas an element of intrinsic motivation. Improvements obtained with the inclusion of this element over classicstrategies are shown through simulations. Boredom is modeled through a chaotic element that generates conditionsfor the creation of routes when the environment does not offer any reward, allowing prompting the robotto navigate. Our proposal seeks to avoid what classical algorithms suffer in scenarios without rewards, generatinglosses of time in the resolution. We demonstrate experimentally that by adding the element of boredomit is possible to generate routes in scenarios in which rewards do not exist, allowing the use of these strategiesin real circumstances and facilitating the robot's navigation towards its objective. The most important contributionsustained by this work corresponds to the fact that it is possible to improve navigation in completelyadverse scenarios for a navigation algorithm based on rewards.

Highlights

Reinforcement learning (RL) is of the most common techniques in the field of machine learning [15, 29]
Considering the training of 2 agents under normal operating conditions in a known world of size 7x7, it is possible to observe a slight superiority of the Q-Learning algorithm with respect to SARSA in the time of convergence towards a solution, this training process is performed under normal conditions with a specific reward
A) Show how the Q-Learning algorithm falls to a minimum and is maintained until the time limit

Summary

Introduction

Reinforcement learning (RL) is of the most common techniques in the field of machine learning [15, 29]. There are multiple algorithms where extrinsic reward elements are considered to improve learning as indicated in [1, 25], a specific dependence of the entity is observed with respect to what the environment can offer it. Human beings have the ability to determine their objectives considering their particular abilities [27], due to this the context in which the human finds himself does not determine how far he will be able to execute a certain task. This attribute can be linked to works developed on emotions as portrayed by [14, 26]

Objectives

Methods

Results

Conclusion