Abstract

Neural networks can learn flexible input-output associations by changing their synaptic weights. The representational performance and learning dynamics of neural networks are intensively studied in several fields. Neural networks face the "credit assignment problem" in situations in which only incomplete performance evaluations are available. The credit assignment problem is that a network should assign credit or blame for its behaviors according to the contribution to the network performance. In reinforcement learning, a scalar evaluation signal is delivered to a network. The two main types of credit assignment problems in reinforcement learning are structural and temporal, that is, which parameters of the network (structural) and which past network activities (temporal) are related to an evaluation signal given from an environment. In this study, we apply statistical mechanical analysis to the learning processes in a simple neural network model to clarify the effects of two kinds of credit assignments and their interactions. Our model is based on node perturbation learning with eligibility trace. Node perturbation is a stochastic gradient learning method that can solve structural credit assignment problems by introducing a perturbation into the system output. The eligibility trace preserves the past network activities with a temporal credit to deal with the delay of an instruction signal. We show that both credit assignment effects mutually interact and the optimal time constant of the eligibility trace varies not only for the evaluation delay but also the network size.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call