Abstract

Deep Q-network (DQN) has attracted increasing attention from both industry and academic communities. Existing methods mostly formulate the decision process as discrete agent-environment interactions, while the intervals between successive interactions are largely neglected, which may otherwise reveal important signals in real-world applications. To bridge this gap, this paper proposes to explicitly model the time intervals in DQN. Specifically, we first cast the agent-environment interactions onto a continuous time dimension, and then define a time-aware learning objective and the corresponding Bellman operator. For sample efficient training, we approximate the Q-function with a neural network, where the time information is modeled by the point process. The intensity function in point process and Q-function are seamlessly integrated by sharing the same history summarization module, such that the time interval information can directly influence the model optimization process. To close the gap between the approximated and optimal Q-function, we theoretically analyze the sample complexity of our model by deriving the finite time bound in continuous time. We conduct both simulation and real-world experiments to demonstrate our model's effectiveness.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call