Scalable Methods for Computing State Similarity in Deterministic Markov Decision Processes

Pablo Samuel Castro

doi:10.1609/aaai.v34i06.6564

Abstract

We present new algorithms for computing and approximating bisimulation metrics in Markov Decision Processes (MDPs). Bisimulation metrics are an elegant formalism that capture behavioral equivalence between states and provide strong theoretical guarantees on differences in optimal behaviour. Unfortunately, their computation is expensive and requires a tabular representation of the states, which has thus far rendered them impractical for large problems. In this paper we present a new version of the metric that is tied to a behavior policy in an MDP, along with an analysis of its theoretical properties. We then present two new algorithms for approximating bisimulation metrics in large, deterministic MDPs. The first does so via sampling and is guaranteed to converge to the true metric. The second is a differentiable loss which allows us to learn an approximation even for continuous state MDPs, which prior to this work had not been possible.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Scalable Methods for Computing State Similarity in Deterministic Markov Decision Processes

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Apr 3, 2020
Citations: 51

Similar Papers

Convergence of the Q-ae learning under deterministic MDPs and its efficiency under the stochastic environment
Gang Zhao ... S Tatsumi
-
Gang Zhao, et. al. Gang Zhao ... S Tatsumi
08 Oct 2000
08 Oct 2000

Representation Discovery for MDPs Using Bisimulation Metrics
Sherry Ruan ... Prakash Panangaden
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 29
Sherry Ruan, et. al.Sherry Ruan ... Prakash Panangaden
04 Mar 2015
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 29

Automatic Construction of Temporally Extended Actions for MDPs Using Bisimulation Metrics
Pablo Samuel Castro ... Doina Precup
-
Pablo Samuel Castro, et. al.Pablo Samuel Castro ... Doina Precup
01 Jan 2012
01 Jan 2012

Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns
Yong Liu ... Changjie Fan
-
Yong Liu, et. al.Yong Liu ... Changjie Fan
01 Aug 2019
01 Aug 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Scalable Methods for Computing State Similarity in Deterministic Markov Decision Processes

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence