Statistical Mechanics of the Delayed Reward-Based Learning with Node Perturbation

Hiroshi Saito,Kazuo Okanoya,Kentaro Katahira,Masato Okada

doi:10.1143/jpsj.79.064003

Abstract

In reward-based learning, reward is typically given with some delay after a behavior that causes the reward. In machine learning literature, the framework of the eligibility trace has been used as one of the solutions to handle the delayed reward in reinforcement learning. In recent studies, the eligibility trace is implied to be important for difficult neuroscience problem known as the “distal reward problem”. Node perturbation is one of the stochastic gradient methods from among many kinds of reinforcement learning implementations, and it searches the approximate gradient by introducing perturbation to a network. Since the stochastic gradient method does not require a objective function differential, it is expected to be able to account for the learning mechanism of a complex system, like a brain. We study the node perturbation with the eligibility trace as a specific example of delayed reward-based learning, and analyzed it using a statistical mechanics approach. As a result, we show the optimal time c...

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Statistical Mechanics of the Delayed Reward-Based Learning with Node Perturbation

Abstract

Talk to us

Similar Papers

More From: Journal of the Physical Society of Japan

Lead the way for us

Journal: Journal of the Physical Society of Japan	Publication Date: Jun 15, 2010
Citations: 1

Similar Papers

Opposition-Based Q(&#955;) Algorithm
M Shokri ... M Kamel
-
M Shokri, et. al.M Shokri ... M Kamel
01 Jan 2006
01 Jan 2006

Use of the knowledge which is independence on reward in reinforcement learning
Yoshiki Miyazaki ... Kentarou Kurashige
-
Yoshiki Miyazaki, et. al.Yoshiki Miyazaki ... Kentarou Kurashige
01 Dec 2009
01 Dec 2009

Stochastic gradient method with accelerated stochastic dynamics
Masayuki Ohzeki
Journal of Physics: Conference Series | VOL. 699
Masayuki OhzekiMasayuki Ohzeki
01 Mar 2016
Journal of Physics: Conference Series | VOL. 699

Statistical mechanics of structural and temporal credit assignment effects on learning in neural networks
Hiroshi Saito ... Kentaro Katahira
Physical Review E | VOL. 83
Hiroshi Saito, et. al.Hiroshi Saito ... Kentaro Katahira
20 May 2011
Physical Review E | VOL. 83

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Statistical Mechanics of the Delayed Reward-Based Learning with Node Perturbation

Abstract

Talk to us

Similar Papers

More From: Journal of the Physical Society of Japan