Motivated Optimal Developmental Learning for Sequential Tasks Without Using Rigid Time-Discounts

Dongshu Wang,Juyang Weng,Yihai Duan

doi:10.1109/tnnls.2017.2762720

Abstract

Many methods for reinforcement learning use symbolic representations-nonemergent-such as Q-learning. We use emergent representations here, without human handcrafted symbolic states (i.e., each state corresponds to a different location). This paper models reinforcement learning for hidden neurons in emergent networks for sequential tasks. In this paper, their influences on sequential tasks (e.g., robot navigation in different scenarios) are investigated where the learned value and results of a behavior rely on not only the current experience just like in a pattern recognition (episodic) but also the prediction of future experiences (e.g., delayed rewards) and environments (e.g., previously learned navigational trajectories). We show that this new model of motivated learning amounts to the computation of the maximum-likelihood estimate through "life" where punishment and reward have increased weights. This new formulation avoids the greediness of time-discount in Q-learning. Its complex nonlinear sequential optimization has been solved in a closed-form procedure under the condition of the limited computational resources and limited learning experience so far, because we convert it into a simpler problem of incremental and linear estimation. The experimental results showed that the serotonin and dopamine systems speed up learning for sequential tasks, because not all events are equally important. As far as we know, this is the first work that studies the influences of reinforcers (via serotonin and dopamine) on hidden neurons (Y neurons) for sequential tasks in dynamic scenarios using emergent representations.

Full Text