Reinforcement Learning Using Kohonen Feature Map Probabilistic Associative Memory Based on Weights Distribution

Yuko Osana

doi:10.5772/13753

Yuko Osana

Open Access

https://doi.org/10.5772/13753

Copy DOI

Export

Save

Cite

Publication Date: Jan 14, 2011
Citations: 4	License type: cc-by-nc-sa

Abstract
Full-Text
Similar Papers

Abstract

Listen

The reinforcement learning is a sub-area of machine learning concerned with how an agent ought to take actions in an environment so as to maximize some notion of long-term reward(Sutton & Barto, 1998). Reinforcement learning algorithms attempt to find a policy that maps states of the world to the actions the agent ought to take in those states. Temporal Difference (TD) learning is one of the reinforcement learning algorithm. The TD learning is a combination of Monte Carlo ideas and dynamic programming (DP) ideas. TD resembles a Monte Carlo method because it learns by sampling the environment according to some policy. TD is related to dynamic programming techniques because it approximates its current estimate based on previously learned estimates. The actor-critic method(Witten, 1977) is the method based on the TD learning, and consists of two parts; (1) actor which selects the action and (2) critic which evaluate the action and the state. On the other hand, neural networks are drawing much attention as a method to realize flexible information processing. Neural networks consider neuron groups of the brain in the creature, and imitate these neurons technologically. Neural networks have some features, especially one of the important features is that the networks can learn to acquire the ability of information processing. The flexible information processing ability of the neural network and the adaptive learning ability of the reinforcement learning are combined, some reinforcement learning method using neural networks are proposed(Shibata et al., 2001; Ishii et al., 2005; Shimizu and Osana, 2008). In this research, we propose the reinforcement learning method using Kohonen Feature Map Probabilistic Associative Memory based on Weights Distribution (KFMPAM-WD)(Osana, 2009). The proposed method is based on the actor-critic method, and the actor is realized by the KFMPAM-WD. The KFMPAM-WD is based on the self-organizing feature map(Kohonen, 1994), and it can realize successive learning and one-to-many associations. The proposed method makes use of this property in order to realize the learning during the practice of task.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Reinforcement Learning Using Kohonen Feature Map Probabilistic Associative Memory Based on Weights Distribution

Abstract

Published Version

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Evidence that the delay-period activity of dopamine neurons corresponds to reward uncertainty rather than backpropagating TD errors
Christopher D Fiorillo ... Philippe N Tobler
Behavioral and Brain Functions | VOL. 1
Christopher D Fiorillo, et. al.Christopher D Fiorillo ... Philippe N Tobler
15 Jun 2005
Behavioral and Brain Functions | VOL. 1

Mechanisms Underlying Dopamine-Mediated Reward Bias in Compulsive Behaviors
Valerie Voon ... Mark Hallett
Neuron | VOL. 65
Valerie Voon, et. al.Valerie Voon ... Mark Hallett
01 Jan 2009
Neuron | VOL. 65

Reinforcement learning and simulation-based search in computer go

-

01 Jan 2009
01 Jan 2009

TD Learning with Neural Networks
Norio Baba
Journal of Robotics and Mechatronics | VOL. 10
Norio BabaNorio Baba
20 Aug 1998
Journal of Robotics and Mechatronics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Reinforcement Learning Using Kohonen Feature Map Probabilistic Associative Memory Based on Weights Distribution

Abstract

Published Version

Talk to us

Similar Papers