Abstract

While deep reinforcement learning (DRL) has achieved great success in some large domains, most of the related algorithms assume that the state of the underlying system is fully observable. However, many real-world problems are actually partially observable. For systems with continuous observation, most of the related algorithms, e.g., the deep Q-network (DQN) and deep recurrent Q-network (DRQN), use history observations to represent states; however, they often make computation-expensive and ignore the information of actions. Predictive state representations (PSRs) can offer a powerful framework for modelling partially observable dynamical systems with discrete or continuous state space, which represents the latent state using completely observable actions and observations. In this paper, we present a PSR model-based DQN approach which combines the strengths of the PSR model and DQN planning. We use a recurrent network to establish the recurrent PSR model, which can fully learn dynamics of the partially continuous observable environment. Then, the model is used for the state representation and update of DQN, which makes DQN no longer rely on a fixed number of history observations or recurrent neural network (RNN) to represent states in the case of partially observable environments. The strong performance of the proposed approach is demonstrated on a set of robotic control tasks from OpenAI Gym by comparing with the technique with the memory-based DRQN and the state-of-the-art recurrent predictive state policy (RPSP) networks. Source code is available at https://github.com/RPSR-DQN/paper-code.git.

Highlights

  • For agents operating in stochastic domains, how to determine the optimal policy is a central and challenge issue

  • We mainly focus on systems with continuous observations, and there are two main approaches for dealing with the partially observable problem in such domain

  • A central problem in artificial intelligence is for agents to find optimal policies in stochastic, partially observable environments, which is an ubiquitous and challenging problem in science and engineering [16]. e commonly used technique for solving such partially observable problems is to model the dynamics of the environments by using the partially observable Markov decision process (POMDP) approach or the Predictive state representations (PSRs) approach firstly [3, 12] and the problem can be solved using the obtained model

Read more

Summary

Introduction

For agents operating in stochastic domains, how to determine the (near) optimal policy is a central and challenge issue. With the benefits of the PSR approach and the great success of deep Q-network in some real-world applications, we propose the RPSR-DQN approach; firstly, a recurrent PSR model of the underlying partially observable systems is built, the truly state, namely, the PSR state or the belief state, can be updated and provide the sufficient information for DQN planning, and the tuple of , where currentPSRstate is the information of the current state and nextPSRstate is the information of the state obtained by taking action under the current state, is stored and used as the data for the training of the deep Q-network. Experiment results show that with the benefits of the DQN framework and the dividing of the learning of the model and the training of the policy, our approach outperforms the state-of-the-art baselines

Related Work
RPSR-DQN
Experiments
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call