Abstract

Uncovering the reward function of optimal controllers is crucial to determine the desired performance that an expert wants to inject to a certain dynamical system. In this paper, a reward inference algorithm of discrete-time expert's controllers is proposed. The approach is inspired by the complementary mechanisms of the striatum, neocortex, and hippocampus for decision making and experience transference. These systems work together to infer the reward function associated to expert's controller using the complementary merits of data-driven and online learning methods. The proposed approach models the neocortex system as two independent learning algorithms given by a Q-learning algorithm and a gradient identification rule. The hippocampus is modelled by a least-squares update rule that extracts the relation from the states and control inputs of the expert's data. The striatum is modelled by an inverse optimal control algorithm which iteratively finds the hidden reward function. Lyapunov stability theory is used to show the stability and convergence of the proposed approach. Simulation studies are given to demonstrate the effectiveness of the proposed complementary learning algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call