Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations

Francisco Martinez-Gil,Ignacio García-Fernández,Pau Romero,Rafael Sebastián,Dolors Serra,Miguel Lozano

doi:10.3390/math8091479

Francisco Martinez-Gil, Ignacio García-Fernández + Show 4 more

Open Access

https://doi.org/10.3390/math8091479

Copy DOI

Journal: Mathematics	Publication Date: Sep 2, 2020
Citations: 13	License type: CC BY 4.0

Affiliation: University of Valencia

Abstract

Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents. This invalidates direct manipulation of the learned value function as a method to modify the derived behaviors. In this paper, we propose the use of Inverse Reinforcement Learning to incorporate real behavior traces in the learning process to shape the learned behaviors, thus increasing their trustworthiness (in terms of conformance to reality). To do so, we adapt the Inverse Reinforcement Learning framework to the navigation problem domain. Specifically, we use Soft Q-learning, an algorithm based on the maximum causal entropy principle, with MARL-Ped (a Reinforcement Learning-based pedestrian simulator) to include information from trajectories of real pedestrians in the process of learning how to navigate inside a virtual 3D space that represents the real environment. A comparison with the behaviors learned using a Reinforcement Learning classic algorithm (Sarsa(λ)) shows that the Inverse Reinforcement Learning behaviors adjust significantly better to the real trajectories.

Highlights

Reinforcement learning (RL) [1] has been extensively used over the past years as a challenging and promising machine learning field in problem domains such as robot control, simulation, quality control, and logistics [2,3,4,5]
= θi + α(i )( ~f ς − f π~i+1 ); In this paper, we propose the use of Inverse Reinforcement Learning (IRL) to include information from the real world inside the learning process
The results of the experiment demonstrate the effectiveness of the proposed approach using the IRL framework for pedestrian navigation

Summary

Introduction

Reinforcement learning (RL) [1] has been extensively used over the past years as a challenging and promising machine learning field in problem domains such as robot control, simulation, quality control, and logistics [2,3,4,5]. Mathematics 2020, 8, 1479 based on function approximators, either linear or nonlinear ones, to represent the world states, the value function and the actions This provides the RL model with the necessary generalization power (it can work well with states or situations never seen in the learning process), but it comes with a price tag; the representation of the learned information is hidden in the complex structure of the generalization system. Relatively simple reward functions have provided realistic and robust behaviors in the pedestrian simulation domain, especially in groups and crowd simulations [6,7]. Inverse Reinforcement Learning (IRL) goes a step forward in this problem, proposing a methodology for the design of the reward function based on real data provided as demonstrations of the desired behavior (see discussion of Section 2.2).

Reinforcement Learning

Inverse Reinforcement Learning

Soft Q-Learning

Result

MDP Setup

IRL Setup

Experimental Results

Conclusions and Future Work

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematics

Lead the way for us

Similar Papers

Reinforcement Learning for Clinical Applications.
Kia Khezeli ... Benjamin Shickel
Clinical journal of the American Society of Nephrology : CJASN | VOL. 18
Kia Khezeli, et. al.Kia Khezeli ... Benjamin Shickel
08 Feb 2023
Clinical journal of the American Society of Nephrology : CJASN | VOL. 18

Sample-Efficient I-Projections for Robot Learning

-

19 Apr 2021
19 Apr 2021

Inverse reinforcement learning using Dynamic Policy Programming
Eiji Uchibe ... Kenji Doya
-
Eiji Uchibe, et. al.Eiji Uchibe ... Kenji Doya
01 Oct 2014
01 Oct 2014

On the use of the policy gradient and Hessian in inverse reinforcement learning
Alberto Maria Metelli ... Matteo Pirotta
Intelligenza Artificiale | VOL. 14
Alberto Maria Metelli, et. al.Alberto Maria Metelli ... Matteo Pirotta
17 Sep 2020
Intelligenza Artificiale | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematics