Abstract

A recurrent spiking neural network is proposed that implements planning as probabilistic inference for finite and infinite horizon tasks. The architecture splits this problem into two parts: The stochastic transient firing of the network embodies the dynamics of the planning task. With appropriate injected input this dynamics is shaped to generate high-reward state trajectories. A general class of reward-modulated plasticity rules for these afferent synapses is presented. The updates optimize the likelihood of getting a reward through a variant of an Expectation Maximization algorithm and learning is guaranteed to convergence to a local maximum. We find that the network dynamics are qualitatively similar to transient firing patterns during planning and foraging in the hippocampus of awake behaving rats. The model extends classical attractor models and provides a testable prediction on identifying modulating contextual information. In a real robot arm reaching and obstacle avoidance task the ability to represent multiple task solutions is investigated. The neural planning method with its local update rules provides the basis for future neuromorphic hardware implementations with promising potentials like large data processing abilities and early initiation of strategies to avoid dangerous situations in robot co-worker scenarios.

Highlights

  • A recurrent spiking neural network is proposed that implements planning as probabilistic inference for finite and infinite horizon tasks

  • The architecture splits this problem into two parts: The stochastic transient firing of the network embodies the dynamics of the planning task

  • We find that the network dynamics are qualitatively similar to transient firing patterns during planning and foraging in the hippocampus of awake behaving rats

Read more

Summary

Introduction

A recurrent spiking neural network is proposed that implements planning as probabilistic inference for finite and infinite horizon tasks. The architecture splits this problem into two parts: The stochastic transient firing of the network embodies the dynamics of the planning task. In parallel to this development, the probabilistic inference perspective has been successfully used in cognitive science and neuroscience for modeling how biological organisms solve planning problems[3,4,5,6] It was not clear how probabilistic planning can be implemented in neural substrates with biologically realistic learning rules. Optimal reward-modulated Hebbian learning rules are derived that implement planning as probabilistic inference in the spiking network through iterative local updates. The presented neural model allows to learn such non-straight line paths from single www.nature.com/scientificreports/

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call