A Deep Reinforcement Learning Approach for Active SLAM

Julio A Placed,José A Castellanos

doi:10.3390/app10238386

Julio A Placed, José A Castellanos

Open Access

https://doi.org/10.3390/app10238386

Copy DOI

Journal: Applied Sciences	Publication Date: Nov 25, 2020
Citations: 20	License type: CC BY 4.0

Affiliation: Universidad de Zaragoza

Abstract

In this paper, we formulate the active SLAM paradigm in terms of model-free Deep Reinforcement Learning, embedding the traditional utility functions based on the Theory of Optimal Experimental Design in rewards, and therefore relaxing the intensive computations of classical approaches. We validate such formulation in a complex simulation environment, using a state-of-the-art deep Q-learning architecture with laser measurements as network inputs. Trained agents become capable not only to learn a policy to navigate and explore in the absence of an environment model but also to transfer their knowledge to previously unseen maps, which is a key requirement in robotic exploration.

Highlights

Simultaneous Localization and Mapping (SLAM) refers to the problem of incrementally building the map of a previously unseen environment while at the same time locating the robot on it
It consists of three stages [8]: (i) the identification of all possible locations to explore, (ii) the computation of the utility or reward generated by the actions that would take the robot from its current position to each of those locations and (iii) the selection and execution of the optimal action
We aim to study that potential for the active SLAM

Summary

Introduction

Simultaneous Localization and Mapping (SLAM) refers to the problem of incrementally building the map of a previously unseen environment while at the same time locating the robot on it. Active SLAM augments this approach to the SLAM problem, and it can be defined as the paradigm of controlling a robot which is performing SLAM so as to reduce the uncertainty of its localization and the map’s representation [6,7]. It consists of three stages [8]: (i) the identification of all possible locations to explore (ideally infinite), (ii) the computation of the utility or reward generated by the actions that would take the robot from its current position to each of those locations and (iii) the selection and execution of the optimal action. This matrix quantification can be done on the basis of either

Objectives

Findings

Discussion

Conclusion