Abstract

Deep Q-network (DQN) has attracted increasing attention from both industry and academic communities. Existing methods mostly formulate the decision process as discrete agent-environment interactions, while the intervals between successive interactions are largely neglected, which may otherwise reveal important signals in real-world applications. To bridge this gap, this paper proposes to explicitly model the time intervals in DQN. Specifically, we first cast the agent-environment interactions onto a continuous time dimension, and then define a time-aware learning objective and the corresponding Bellman operator. For sample efficient training, we approximate the Q-function with a neural network, where the time information is modeled by the point process. The intensity function in point process and Q-function are seamlessly integrated by sharing the same history summarization module, such that the time interval information can directly influence the model optimization process. To close the gap between the approximated and optimal Q-function, we theoretically analyze the sample complexity of our model by deriving the finite time bound in continuous time. We conduct both simulation and real-world experiments to demonstrate our model's effectiveness.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.