On Solving MDPs With Large State Space: Exploitation of Policy Structures and Spectral Properties

Libin Liu,Urbashi Mitra,Arpan Chattopadhyay

doi:10.1109/tcomm.2019.2899620

Libin Liu, Urbashi Mitra + Show 1 more

Open Access

https://doi.org/10.1109/tcomm.2019.2899620

Copy DOI

Abstract

In this paper, a point-to-point network transmission control problem is formulated as a Markov decision process (MDP). Classical dynamic programming techniques such as value iteration, policy iteration, and linear programming can be employed to solve the optimization problem, but they suffer from high-computational complexity in networks with large state space. To achieve complexity reduction, the structure of the optimal policy can be exploited and incorporated into standard algorithms. In addition, function approximation can also be applied, where the value function is approximated by the linear combination of some basis vectors in a lower dimensional subspace. The main challenge for function approximation lies in the absence of general guidelines for subspace construction. In this paper, a proper subspace for projection is first generated based on system information, and more general construction methods are proposed using tools from graph signal processing (GSP). Graph symmetrization methods are also used to tackle the directed nature of the probability transition graph so that the well-developed GSP theory for undirected graphs can be employed. The numerical results for a typical wireless system show that standard algorithms with structural information incorporated can achieve 50% complexity reduction without performance loss. The subspace generated from the system can achieve zero policy error with faster runtime, and the GSP approach can also provide a proper subspace for perfect reconstruction of the optimal policy. It is also shown that how the proposed method can be applied to other MDP problems.

Full Text