Learning search polices from humans in a partially observable context

Guillaume De Chambrier,Aude Billard

doi:10.1186/s40638-014-0008-1

Guillaume De Chambrier, Aude Billard

Open Access

https://doi.org/10.1186/s40638-014-0008-1

Copy DOI

Abstract

Decision making and planning for which the state information is only partially available is a problem faced by all forms of intelligent entities they being either virtual, synthetic or biological. The standard approach to mathematically solve such a decisional problem is to formulate it as a partially observable decision process (POMDP) and apply the same optimisation techniques used in the Markov decision process (MDP). However, applying naively the same methodology to solve MDPs as with POMDPs makes the problem computationally intractable. To address this problem, we take a programming by demonstration approach to provide a solution to the POMDP in continuous state and action space. In this work, we model the decision making process followed by humans when searching blindly for an object on a table. We show that by representing the belief of the human’s position in the environment by a particle filter (PF) and learning a mapping from this belief to their end effector velocities with a Gaussian mixture model (GMM), we can model the human’s search process and reproduce it for any agent. We further categorize the type of behaviours demonstrated by humans as being either risk-prone or risk-averse and find that more than 70% of the human searches were considered to be risk-averse. We contrast the performance of this human-inspired search model with respect to greedy and coastal navigation search methods. Our evaluation metric is the distance taken to reach the goal and how each method minimises the uncertainty. We further analyse the control policy of the coastal navigation and GMM search models and argue that taking into account uncertainty is more efficient with respect to distance travelled to reach the goal.

Highlights

Acting under partial observability Learning controllers or policies to act within a context where the state space is partially observable is of high relevance to all real robotic applications
We analysed the types of behaviour present in the human demonstration as well as in four different search algorithms, namely greedy, Gaussian mixture model (GMM), hybrid and coastal
In this work, we have shown a novel approach in teaching a robot to act in a partially observable environment

Summary

Introduction

Acting under partial observability Learning controllers or policies to act within a context where the state space is partially observable is of high relevance to all real robotic applications. Resulting from limited and inaccurate perceptual information, often only an approximation of the environment is available at any given time. If this inherent uncertainty is not taken into account during planning or control, there is a non-negligible risk of missing goals, getting lost and wasting valuable resources. A common approach is to formulate the uncertainty present in both action and state as a partially observable Markov decision process (POMDP). POMDPs are an extensive area of research in the operational research, planning and decision theory community [1,2]. The emphasis is to be able to act optimally with respect to an

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Robotics and Biomimetics	Publication Date: Nov 7, 2014
Citations: 23	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Learning search polices from humans in a partially observable context

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Robotics and Biomimetics

Lead the way for us

Similar Papers

Easy Affine Markov Decision Processes
Jie Ning ... Matthew J Sobel
Operations Research | VOL. 67
Jie Ning, et. al.Jie Ning ... Matthew J Sobel
01 Nov 2019
Operations Research | VOL. 67

To explore continuous action space in actor/critic architecture
Wenwei Yu ... H Yokoi
-
Wenwei Yu, et. al. Wenwei Yu ... H Yokoi
12 Oct 1999
12 Oct 1999

Biological plausible goal-directed navigation model based on direct reinforcement learning algorithm
Ti Li ... Lue Fang
SCIENTIA SINICA Informationis | VOL. 46
Ti Li, et. al.Ti Li ... Lue Fang
22 Feb 2016
SCIENTIA SINICA Informationis | VOL. 46

Observation-Based Optimization for POMDPs With Continuous State, Observation, and Action Spaces
Xiaofeng Jiang ... Xiaobin Tan
IEEE Transactions on Automatic Control | VOL. 64
Xiaofeng Jiang, et. al.Xiaofeng Jiang ... Xiaobin Tan
01 May 2019
IEEE Transactions on Automatic Control | VOL. 64

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning search polices from humans in a partially observable context

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Robotics and Biomimetics