Learning Large Graph-Based MDPs With Historical Data

Ravi N Haksar,Mac Schwager

doi:10.1109/tcns.2021.3128530

Abstract

Weconsider learning the dynamics and measurement model parameters of a graph-based Markov decision process (GMDP) given a history of measurements. Graph-based models have been used in modeling many data-based applications, such as recognition tasks, disease epidemics, forest wildfires, freeway traffic, and social networks. We leverage the expectation–maximization framework and develop an algorithm that optimizes the measurement likelihood and has favorable complexity for large models. In contrast to prior work, we directly consider GMDPs with significantly large discrete state spaces, arbitrary coupling structure, and long measurement sequences. We also consider a special structural property called Anonymous Influence, which we use to test hypotheses and gain insights into the data. We demonstrate the effectiveness of our learning algorithm by considering two real-world data sets on the 2020 Novel Coronavirus (COVID-19) pandemic in California and on user interactions on Twitter. Our results show that the learned GMDP models better explain the data compared to an uncoupled model assumption.

Full Text