Abstract

In recent years, deep reinforcement learning (RL) has shown excellent performance in robot control, video games, and multi-agent systems. However, most of existing RL models do not generalize. Even a small visual change will greatly deteriorate the performance of RL agents, which limits the generalization and flexibility of RL in real-world applications. To address this problem, we propose a two-stage model in which reinforcement learning agents learn adaptation to changes in the visual environment before learning optimal behavioral policies. In the first stage, we employ domain adaptation to align the distribution of domain-invariant state representations from different domains in the latent feature space. Specifically, we introduce feature-level and pixel-level multi-granularity adversarial loss to constrain the learning of domain-invariant state representations. In the second stage, the RL agent is trained based on the learned domain-invariant state representations. Since the adjusted observation is domain-invariant, the learned policy has strong cross-domain generalization performance. We name the proposed method as Adversarial-based Domain Invariant State Representation (Ad-DISR). At last, we evaluate Ad-DISR on various variants of Car-Racing games and CARLA, an autonomous driving simulator. The results show that our method can achieve better performance on both reward scores and living time in both source and target domains.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call