AdaBoost maximum entropy deep inverse reinforcement learning with truncated gradient

Li Song,Dazi Li,Xiao Wang,Xin Xu

doi:10.1016/j.ins.2022.04.017

Abstract

Studying the representational capacity of neural networks to learn nonlinear rewards is necessary in a complex and nonlinear environment. Over recent years, the maximum entropy deep inverse reinforcement learning algorithm (ME-DIRL) has been increasingly applied to the learning of nonlinear rewards. However, under cases of limited and imbalanced expert demonstration data, complex calculations, or overfitting, the learning nonlinear rewards remains a challenging problem. A novel ME-DIRL with AdaBoost algorithm (AME-DIRL) is our proposed solution. The focus of AME-DIRL is to utilize the AdaBoost algorithm. This combines multiple ME-DIRL processes to form a strong learner and thus overcome the imbalance of the data set. Furthermore, to deal with the complex calculations in AME-DIRL, a truncated gradient (TG) method is applied for getting the sparse rewards obtained by the strong learner, thus reducing the model complexity. To prevent overfitting, a correction factor is then added to the linear combination of weak learners. AME-DIRL models the relationship between input features and output rewards. Rewards are approximated by means of a convolutional neural network (CNN) with scaled exponential linear units (SELUs). Numerical results indicate that our proposed AME-DIRL shows higher accuracy in learning rewards when compared with several classical inverse reinforcement learning (IRL) algorithms.

Full Text