Biologically-inspired neural controller based on adaptive reward learning

Cheng Gordon

doi:10.3389/conf.fncom.2012.55.00164

Abstract

Event Abstract Back to Event Biologically-inspired neural controller based on adaptive reward learning John Nassour1* and Gordon Cheng1 1 Technische Universität München, Germany Human learns tasks from their own experiences by self-exploration and observation of others' actions. The evaluation of the achieved task is driven by rewards. Human can improve their skills in order to gain more rewards (e.g. Happiness, Food, Money, and etc...). By observing its cortical activities, neurobiological studies suggest that the orbitofrontal cortex (OFC) is related to reward dealing in the human brain [1]. Neurons of OFC are the key reward structure of the brain, where reward is coded in an adaptive and flexible way [2]. Studies of the Anterior Cingulate Cortex (ACC) suggest that it is responsible to avoid repeating mistakes [3]. This cortical area acts as an early warning system (EWS) that adjusts the behavior to avoid dangerous situations. It responds not only to the sources of errors (external error feedback), but also to earliest sources of error information available (internal error detection) [4]. EWS has shown to be affected by the tolerant to risks, psychological studies provide further evidences of people’s strategies into two classes as in taking or aversion risks [5]. “NeuroRobotics” research draw on human learning methods in order to improve the autonomy and the robustness of robots for their dealing with environment changes. In connection with these neurological studies, we proposed a learning method based on human learning from experiences (ACC) and inspired by the way the human brain code rewards (OFC), in order to allow a humanoid robot to learn a walking task. With the vigilance threshold concept that represents the tolerance to risk, the method guaranteed the balance between exploration and exploitation, unlike other searching methods (e.g. Q-learning, Monte Carlo…). Furthermore, it is able to converge into multiple learning targets. Most task learning methods based on reward use predefined parameters in their reward function [6], which cannot be obtained without previous experiences to achieve the desired task. Learning based adaptive reward don’t require any previous information about the reward, it is able to build the experience only based on the reward available information after starting from scratch. Our approach has been implemented on the NAO humanoid robot, controlled by a bio-inspired neural controller based on a central pattern generator (CPG). The learning system adapts the oscillation frequency and the motor neuron gain in pitch and roll in order to walk on flat and sloped terrain, and to switch between them.

Full Text