Abstract

Inverse reinforcement learning (IRL) is usually used in deep reinforcement learning systems for tasks that are difficult to design with manual reward functions. If the task is too complicated, the expert sample trajectories obtained artificially often have different preferences, resulting in a relatively large variance in the learned reward function. For this purpose, this study proposes a behavior fusion method based on adversarial IRL. We decompose complex tasks into several simple subtasks according to different preferences. After decoupling the tasks, we use the inherent relationship between IRL and generative adversarial network (GAN): the discriminator network fits the reward function and the generator network fits strategy, and the reward function and policy are learned respectively. Moreover, we improve the adversarial IRL model by using multiple discriminators to correspond to each subtask, and provide a more efficient update for the whole structure. The behavior fusion in this work acts a weighted network on the reward functions in different subtasks. The proposed method is evaluated on Atari enduro racing game with baseline methods, and we implement a wafer inspection experiment for further discussions. The experimental results show our method can learn more advanced policies in complicated tasks, and the training process is more stable.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call