Human and machine learning in non-Markovian decision making.

Aaron Michael Clarke,Walter Senn,Johannes Friedrich,Michael H Herzog,Elisa M Tartaglia,Silvia Marchesotti

doi:10.1371/journal.pone.0123105

Abstract

Humans can learn under a wide variety of feedback conditions. Reinforcement learning (RL), where a series of rewarded decisions must be made, is a particularly important type of learning. Computational and behavioral studies of RL have focused mainly on Markovian decision processes, where the next state depends on only the current state and action. Little is known about non-Markovian decision making, where the next state depends on more than the current state and action. Learning is non-Markovian, for example, when there is no unique mapping between actions and feedback. We have produced a model based on spiking neurons that can handle these non-Markovian conditions by performing policy gradient descent [1]. Here, we examine the model’s performance and compare it with human learning and a Bayes optimal reference, which provides an upper-bound on performance. We find that in all cases, our population of spiking neurons model well-describes human performance.

Highlights

Typical laboratory experiments on human learning provide trial by trial feedback following each stimulus presentation
The first non-Markovian learning situation we consider is learning with switch-states
Participants completed as many episodes as they could in two sessions of 10 minutes each

Summary

Introduction

Typical laboratory experiments on human learning provide trial by trial feedback following each stimulus presentation. Feedback is often delayed and sparse [2]. For example, several moves must be made before a player receives feedback about a game’s outcome (win, or lose). From this feedback it is impossible to infer directly whether a particular move was good or bad. Rewards might vary from one learning situation to the next. Different apples within an orchard might have different tastes. Learning in these situations is well-described by what are known as reinforcement learning (RL) models

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Apr 21, 2015
Citations: 41	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Human and machine learning in non-Markovian decision making.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Transition-based versus state-based reward functions for MDPs with Value-at-Risk
Shuai Ma ... Jia Yuan Yu
-
Shuai Ma, et. al.Shuai Ma ... Jia Yuan Yu
01 Oct 2017
01 Oct 2017

Probably Approximately Correct (PAC) exploration in reinforcement learning

-

01 Jan 2007
01 Jan 2007

Average-Reward Reinforcement Learning
Prasad Tadepalli
-
Prasad TadepalliPrasad Tadepalli
01 Jan 2014
01 Jan 2014

Towards Generalization and Efficiency in Reinforcement Learning

-

02 Jul 2019
02 Jul 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Human and machine learning in non-Markovian decision making.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE